You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Justine Olshan <jo...@confluent.io.INVALID> on 2022/07/21 20:53:30 UTC

[DISCUSS] KIP-854 Separate configuration for producer ID expiry

Hey all!

I'd like to start a discussion on my proposal to separate time-based
producer ID expiration from transactional ID expiration by introducing a
new configuration.

The KIP Is pretty small and simple, but will be helpful in controlling
memory usage in brokers -- especially now that by default producers are
idempotent and create producer ID state.

Please take a look and leave any comments you may have!

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
JIRA: https://issues.apache.org/jira/browse/KAFKA-14097

Thanks!
Justine

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Sagar <sa...@gmail.com>.
Thanks Justine,

I thought -1 might be a good default setting for backward compatibility
reasons. I am not too adamant on it either. Probably worth mentioning in
the description that setting it to -1 would disable the feature?

Other than that, LGTM!

Thanks!
Sagar.

On Fri, Jul 29, 2022 at 8:37 AM Luke Chen <sh...@gmail.com> wrote:

> Hi Jason,
>
> Thanks for the info. I don't strongly insist on making the default to -1
> for backward compatibility.
> If other people in the community also agree with the change, I'm good with
> that.
>
> Thank you.
> Luke
>
>
>
>
> On Fri, Jul 29, 2022 at 5:35 AM Justine Olshan
> <jo...@confluent.io.invalid>
> wrote:
>
> > Thanks Jason, Luke, Sagar, and Kirk,
> >
> > Seems like there is still some debate over the default value. I think
> there
> > is a general consensus that we can reduce the default at some point, but
> > exactly when is still not clear. I do think Jason made a good point about
> > applications taking 1 day to retry. I am interested if there are other
> use
> > cases we didn't consider though.
> >
> > I've also updated the description to reference `delivery.timeout.ms.` I'm
> > not sure if we also need that config to reference this one (the
> > bi-directional reference Kirk mentioned). Let me know if something should
> > still be updated or if something is unclear.
> >
> > Thanks again,
> > Justine
> >
> > On Thu, Jul 28, 2022 at 10:46 AM Kirk True <ki...@kirktrue.pro> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP. I appreciated the background context and clarity
> you
> > > added.
> > >
> > > On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> > > > Thanks Justine for the KIP. I think it might be better to document
> the
> > > > correlation between the new config and delivery.timeout.ms in the
> > Public
> > > > Interfaces Description.
> > >
> > > +1.
> > >
> > > A bi-directional reference between the two configuration options would
> be
> > > great for clarity. This is especially true given that the value of `
> > > producer.id.expiration.ms, when left at -1, comes from the value of
> > > transactional.id.expiration.ms.`
> > >
> > > Thanks!
> > > Kirk
> > >
> > > >
> > > > Also, I agree with Luke that for now setting a default to -1 should
> be
> > > > good. We can look to switch to 1 day with major release.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP.
> > > > > I agree with you that we should try our best to keep backward
> > > > > compatibility, although our intention is to have lower producer id
> > > > > expiration timeout.
> > > > > So, I think we should keep default to -1 IMO.
> > > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > <jo...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Thanks for taking a look Jason!
> > > > > >
> > > > > > I wondered if we wanted to have a smaller default but wasn't sure
> > > about
> > > > > the
> > > > > > compatibility story -- especially since there is the chance for
> > > producer
> > > > > > IDs to expire silently.
> > > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > > conflicting
> > > > > > opinions I can go ahead and update the default.
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > <ja...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the KIP. Although I hate seeing new configurations,
> I
> > > think
> > > > > > this
> > > > > > > is a good change. Combining these timeout behaviors into a
> single
> > > > > > > configuration was definitely a mistake, but we didn't
> anticipate
> > > the
> > > > > > > problem with the producer id cache. I do wonder if we can make
> > the
> > > > > > default
> > > > > > > a bit lower to reduce the chances that users will hit the same
> > > memory
> > > > > > > issues we have seen. After decoupling this configuration from
> > > > > > > transactional.id.expiration.ms, the new timeout just needs to
> > > cover
> > > > > the
> > > > > > > longest duration that a producer might be retrying the same
> > Produce
> > > > > > > request. 7 days seems too high. Although I think it could go a
> > fair
> > > > > even
> > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Bill,
> > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > transactional.id.expiration.ms would also be over the
> delivery
> > > > > > timeout.
> > > > > > > :)
> > > > > > > > Thanks for the +1!
> > > > > > > >
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> bbejeck@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > > question/suggestion
> > > > > > > > > about default values has already been addressed in the
> > > > > > `Compatibility`
> > > > > > > > > section.
> > > > > > > > >
> > > > > > > > > I'm +1 on the KIP.
> > > > > > > > >
> > > > > > > > > -Bill
> > > > > > > > >
> > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Justine,
> > > > > > > > > >
> > > > > > > > > > Thanks for the well written KIP, this looks like it will
> > be a
> > > > > > useful
> > > > > > > > > > addition.
> > > > > > > > > >
> > > > > > > > > > Overall the KIP looks good to me, I have one
> > > question/comment.
> > > > > > > > > >
> > > > > > > > > > You mentioned that setting the `
> producer.id.expiration.ms`
> > > less
> > > > > > than
> > > > > > > > the
> > > > > > > > > > delivery timeout could lead to duplicates, which makes
> > > sense.  To
> > > > > > > help
> > > > > > > > > > avoid this situation, do we want to consider a default
> > value
> > > that
> > > > > > is
> > > > > > > > the
> > > > > > > > > > same as the delivery timeout?
> > > > > > > > > >
> > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > >
> > > > > > > > > > Bill
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey all!
> > > > > > > > > >>
> > > > > > > > > >> I'd like to start a discussion on my proposal to
> separate
> > > > > > time-based
> > > > > > > > > >> producer ID expiration from transactional ID expiration
> by
> > > > > > > > introducing a
> > > > > > > > > >> new configuration.
> > > > > > > > > >>
> > > > > > > > > >> The KIP Is pretty small and simple, but will be helpful
> in
> > > > > > > controlling
> > > > > > > > > >> memory usage in brokers -- especially now that by
> default
> > > > > > producers
> > > > > > > > are
> > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > >>
> > > > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > > > >>
> > > > > > > > > >> KIP:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > >>
> > > > > > > > > >> Thanks!
> > > > > > > > > >> Justine
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Luke Chen <sh...@gmail.com>.
Hi Jason,

Thanks for the info. I don't strongly insist on making the default to -1
for backward compatibility.
If other people in the community also agree with the change, I'm good with
that.

Thank you.
Luke




On Fri, Jul 29, 2022 at 5:35 AM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Thanks Jason, Luke, Sagar, and Kirk,
>
> Seems like there is still some debate over the default value. I think there
> is a general consensus that we can reduce the default at some point, but
> exactly when is still not clear. I do think Jason made a good point about
> applications taking 1 day to retry. I am interested if there are other use
> cases we didn't consider though.
>
> I've also updated the description to reference `delivery.timeout.ms.` I'm
> not sure if we also need that config to reference this one (the
> bi-directional reference Kirk mentioned). Let me know if something should
> still be updated or if something is unclear.
>
> Thanks again,
> Justine
>
> On Thu, Jul 28, 2022 at 10:46 AM Kirk True <ki...@kirktrue.pro> wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP. I appreciated the background context and clarity you
> > added.
> >
> > On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> > > Thanks Justine for the KIP. I think it might be better to document the
> > > correlation between the new config and delivery.timeout.ms in the
> Public
> > > Interfaces Description.
> >
> > +1.
> >
> > A bi-directional reference between the two configuration options would be
> > great for clarity. This is especially true given that the value of `
> > producer.id.expiration.ms, when left at -1, comes from the value of
> > transactional.id.expiration.ms.`
> >
> > Thanks!
> > Kirk
> >
> > >
> > > Also, I agree with Luke that for now setting a default to -1 should be
> > > good. We can look to switch to 1 day with major release.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP.
> > > > I agree with you that we should try our best to keep backward
> > > > compatibility, although our intention is to have lower producer id
> > > > expiration timeout.
> > > > So, I think we should keep default to -1 IMO.
> > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > <jo...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Thanks for taking a look Jason!
> > > > >
> > > > > I wondered if we wanted to have a smaller default but wasn't sure
> > about
> > > > the
> > > > > compatibility story -- especially since there is the chance for
> > producer
> > > > > IDs to expire silently.
> > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > conflicting
> > > > > opinions I can go ahead and update the default.
> > > > >
> > > > > Justine
> > > > >
> > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > <ja...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for the KIP. Although I hate seeing new configurations, I
> > think
> > > > > this
> > > > > > is a good change. Combining these timeout behaviors into a single
> > > > > > configuration was definitely a mistake, but we didn't anticipate
> > the
> > > > > > problem with the producer id cache. I do wonder if we can make
> the
> > > > > default
> > > > > > a bit lower to reduce the chances that users will hit the same
> > memory
> > > > > > issues we have seen. After decoupling this configuration from
> > > > > > transactional.id.expiration.ms, the new timeout just needs to
> > cover
> > > > the
> > > > > > longest duration that a producer might be retrying the same
> Produce
> > > > > > request. 7 days seems too high. Although I think it could go a
> fair
> > > > even
> > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > <jo...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Bill,
> > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > transactional.id.expiration.ms would also be over the delivery
> > > > > timeout.
> > > > > > :)
> > > > > > > Thanks for the +1!
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bbejeck@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > question/suggestion
> > > > > > > > about default values has already been addressed in the
> > > > > `Compatibility`
> > > > > > > > section.
> > > > > > > >
> > > > > > > > I'm +1 on the KIP.
> > > > > > > >
> > > > > > > > -Bill
> > > > > > > >
> > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> bbejeck@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > Thanks for the well written KIP, this looks like it will
> be a
> > > > > useful
> > > > > > > > > addition.
> > > > > > > > >
> > > > > > > > > Overall the KIP looks good to me, I have one
> > question/comment.
> > > > > > > > >
> > > > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> > less
> > > > > than
> > > > > > > the
> > > > > > > > > delivery timeout could lead to duplicates, which makes
> > sense.  To
> > > > > > help
> > > > > > > > > avoid this situation, do we want to consider a default
> value
> > that
> > > > > is
> > > > > > > the
> > > > > > > > > same as the delivery timeout?
> > > > > > > > >
> > > > > > > > > Thanks again for the KIP.
> > > > > > > > >
> > > > > > > > > Bill
> > > > > > > > >
> > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > >
> > > > > > > > >> Hey all!
> > > > > > > > >>
> > > > > > > > >> I'd like to start a discussion on my proposal to separate
> > > > > time-based
> > > > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > > > introducing a
> > > > > > > > >> new configuration.
> > > > > > > > >>
> > > > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > > > controlling
> > > > > > > > >> memory usage in brokers -- especially now that by default
> > > > > producers
> > > > > > > are
> > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > >>
> > > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > > >>
> > > > > > > > >> KIP:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > >>
> > > > > > > > >> Thanks!
> > > > > > > > >> Justine
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Thanks Jason, Luke, Sagar, and Kirk,

Seems like there is still some debate over the default value. I think there
is a general consensus that we can reduce the default at some point, but
exactly when is still not clear. I do think Jason made a good point about
applications taking 1 day to retry. I am interested if there are other use
cases we didn't consider though.

I've also updated the description to reference `delivery.timeout.ms.` I'm
not sure if we also need that config to reference this one (the
bi-directional reference Kirk mentioned). Let me know if something should
still be updated or if something is unclear.

Thanks again,
Justine

On Thu, Jul 28, 2022 at 10:46 AM Kirk True <ki...@kirktrue.pro> wrote:

> Hi Justine,
>
> Thanks for the KIP. I appreciated the background context and clarity you
> added.
>
> On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> > Thanks Justine for the KIP. I think it might be better to document the
> > correlation between the new config and delivery.timeout.ms in the Public
> > Interfaces Description.
>
> +1.
>
> A bi-directional reference between the two configuration options would be
> great for clarity. This is especially true given that the value of `
> producer.id.expiration.ms, when left at -1, comes from the value of
> transactional.id.expiration.ms.`
>
> Thanks!
> Kirk
>
> >
> > Also, I agree with Luke that for now setting a default to -1 should be
> > good. We can look to switch to 1 day with major release.
> >
> > Thanks!
> > Sagar.
> >
> > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.
> > > I agree with you that we should try our best to keep backward
> > > compatibility, although our intention is to have lower producer id
> > > expiration timeout.
> > > So, I think we should keep default to -1 IMO.
> > > Maybe we change the default to 1 day in next major release (4.0)?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > <jo...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Thanks for taking a look Jason!
> > > >
> > > > I wondered if we wanted to have a smaller default but wasn't sure
> about
> > > the
> > > > compatibility story -- especially since there is the chance for
> producer
> > > > IDs to expire silently.
> > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > conflicting
> > > > opinions I can go ahead and update the default.
> > > >
> > > > Justine
> > > >
> > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > <ja...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP. Although I hate seeing new configurations, I
> think
> > > > this
> > > > > is a good change. Combining these timeout behaviors into a single
> > > > > configuration was definitely a mistake, but we didn't anticipate
> the
> > > > > problem with the producer id cache. I do wonder if we can make the
> > > > default
> > > > > a bit lower to reduce the chances that users will hit the same
> memory
> > > > > issues we have seen. After decoupling this configuration from
> > > > > transactional.id.expiration.ms, the new timeout just needs to
> cover
> > > the
> > > > > longest duration that a producer might be retrying the same Produce
> > > > > request. 7 days seems too high. Although I think it could go a fair
> > > even
> > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > <jo...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Hey Bill,
> > > > > > Thanks! I was just going to say that hopefully
> > > > > > transactional.id.expiration.ms would also be over the delivery
> > > > timeout.
> > > > > :)
> > > > > > Thanks for the +1!
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > I just took another look at the KIP, and I realize my
> > > > > question/suggestion
> > > > > > > about default values has already been addressed in the
> > > > `Compatibility`
> > > > > > > section.
> > > > > > >
> > > > > > > I'm +1 on the KIP.
> > > > > > >
> > > > > > > -Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bbejeck@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > > useful
> > > > > > > > addition.
> > > > > > > >
> > > > > > > > Overall the KIP looks good to me, I have one
> question/comment.
> > > > > > > >
> > > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> less
> > > > than
> > > > > > the
> > > > > > > > delivery timeout could lead to duplicates, which makes
> sense.  To
> > > > > help
> > > > > > > > avoid this situation, do we want to consider a default value
> that
> > > > is
> > > > > > the
> > > > > > > > same as the delivery timeout?
> > > > > > > >
> > > > > > > > Thanks again for the KIP.
> > > > > > > >
> > > > > > > > Bill
> > > > > > > >
> > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > >
> > > > > > > >> Hey all!
> > > > > > > >>
> > > > > > > >> I'd like to start a discussion on my proposal to separate
> > > > time-based
> > > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > > introducing a
> > > > > > > >> new configuration.
> > > > > > > >>
> > > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > > controlling
> > > > > > > >> memory usage in brokers -- especially now that by default
> > > > producers
> > > > > > are
> > > > > > > >> idempotent and create producer ID state.
> > > > > > > >>
> > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > >>
> > > > > > > >> KIP:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >> Justine
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Kirk True <ki...@kirktrue.pro>.
Hi Justine,

Thanks for the KIP. I appreciated the background context and clarity you added.

On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> Thanks Justine for the KIP. I think it might be better to document the
> correlation between the new config and delivery.timeout.ms in the Public
> Interfaces Description.

+1.

A bi-directional reference between the two configuration options would be great for clarity. This is especially true given that the value of `producer.id.expiration.ms, when left at -1, comes from the value of transactional.id.expiration.ms.`

Thanks!
Kirk

> 
> Also, I agree with Luke that for now setting a default to -1 should be
> good. We can look to switch to 1 day with major release.
> 
> Thanks!
> Sagar.
> 
> On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> 
> > Hi Justine,
> >
> > Thanks for the KIP.
> > I agree with you that we should try our best to keep backward
> > compatibility, although our intention is to have lower producer id
> > expiration timeout.
> > So, I think we should keep default to -1 IMO.
> > Maybe we change the default to 1 day in next major release (4.0)?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > <jo...@confluent.io.invalid>
> > wrote:
> >
> > > Thanks for taking a look Jason!
> > >
> > > I wondered if we wanted to have a smaller default but wasn't sure about
> > the
> > > compatibility story -- especially since there is the chance for producer
> > > IDs to expire silently.
> > > I do think that 1 day is fairly reasonable. If I don't hear any
> > conflicting
> > > opinions I can go ahead and update the default.
> > >
> > > Justine
> > >
> > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > <ja...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP. Although I hate seeing new configurations, I think
> > > this
> > > > is a good change. Combining these timeout behaviors into a single
> > > > configuration was definitely a mistake, but we didn't anticipate the
> > > > problem with the producer id cache. I do wonder if we can make the
> > > default
> > > > a bit lower to reduce the chances that users will hit the same memory
> > > > issues we have seen. After decoupling this configuration from
> > > > transactional.id.expiration.ms, the new timeout just needs to cover
> > the
> > > > longest duration that a producer might be retrying the same Produce
> > > > request. 7 days seems too high. Although I think it could go a fair
> > even
> > > > lower, perhaps 1 day is a reasonable place to start?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > <jo...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hey Bill,
> > > > > Thanks! I was just going to say that hopefully
> > > > > transactional.id.expiration.ms would also be over the delivery
> > > timeout.
> > > > :)
> > > > > Thanks for the +1!
> > > > >
> > > > > Justine
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > I just took another look at the KIP, and I realize my
> > > > question/suggestion
> > > > > > about default values has already been addressed in the
> > > `Compatibility`
> > > > > > section.
> > > > > >
> > > > > > I'm +1 on the KIP.
> > > > > >
> > > > > > -Bill
> > > > > >
> > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > useful
> > > > > > > addition.
> > > > > > >
> > > > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > > > >
> > > > > > > You mentioned that setting the `producer.id.expiration.ms` less
> > > than
> > > > > the
> > > > > > > delivery timeout could lead to duplicates, which makes sense.  To
> > > > help
> > > > > > > avoid this situation, do we want to consider a default value that
> > > is
> > > > > the
> > > > > > > same as the delivery timeout?
> > > > > > >
> > > > > > > Thanks again for the KIP.
> > > > > > >
> > > > > > > Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > >
> > > > > > >> Hey all!
> > > > > > >>
> > > > > > >> I'd like to start a discussion on my proposal to separate
> > > time-based
> > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > introducing a
> > > > > > >> new configuration.
> > > > > > >>
> > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > controlling
> > > > > > >> memory usage in brokers -- especially now that by default
> > > producers
> > > > > are
> > > > > > >> idempotent and create producer ID state.
> > > > > > >>
> > > > > > >> Please take a look and leave any comments you may have!
> > > > > > >>
> > > > > > >> KIP:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >> Justine
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Sagar <sa...@gmail.com>.
Hey Justine,

Thanks for that. In a way you are right, that setting the new config to -1
is equivalent to setting it equal to transactional.id.expiration.ms . The
only reason I mentioned the -1 case is it gives the users the ability to
disable this feature (which mayn't even be desirable based on what Jason
described above) and still be able to re-configure
transactional.id.expiration.ms. Otherwise, everytime users change
transactional.id.expiration.ms, they will need to also remember and change
this new config. Again I don't know how frequently this config is changed
or should users even disable this feature, but that was the reasoning I
had. If you think it isn't necessary, we can remove it.

On Sat, Jul 30, 2022 at 3:08 AM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hi all,
>
> I've updated the KIP to include the new default of 1 day and information
> about -1 in the description of the config.
> I wonder though if including -1 makes sense now that it is not the default
> value. Is there a benefit for manually setting -1 vs manually setting the
> value that transactional.id.expiration.ms has?
>
> Let me know your thoughts.
> Thanks,
> Justine
>
> On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > +1 for having 1 day as the default and for including this change in the
> > release notes.
> >
> > Ismael
> >
> > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> <jason@confluent.io.invalid
> > >
> > wrote:
> >
> > > I don't think a major release is a requirement for a default change for
> > > what it's worth. I do appreciate that there is a preference for not
> > rocking
> > > the boat though. For a little bit of background here, the problem we
> > > have encountered in production since the idempotent producer became the
> > > default is OOM errors due to huge numbers of producerIds that had to be
> > > retained in the cache for 7 days. It is hard to prevent use cases from
> > > emerging where producers are used and discarded rapidly. We will be
> > using a
> > > lower value for sure, but it would also be nice to reduce the
> likelihood
> > > for the community to see this problem. The benefit of the caching
> > > diminishes quickly over time since it is primarily meant to handle
> > producer
> > > retry windows. I do not think there is much difference between 1 days
> > and 7
> > > days from an application perspective, but it is a huge difference for
> the
> > > broker's memory usage.
> > >
> > > Best,
> > > Jason
> > >
> > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com>
> wrote:
> > >
> > > > Thanks Justine for the KIP. I think it might be better to document
> the
> > > > correlation between the new config and delivery.timeout.ms in the
> > Public
> > > > Interfaces Description.
> > > >
> > > > Also, I agree with Luke that for now setting a default to -1 should
> be
> > > > good. We can look to switch to 1 day with major release.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP.
> > > > > I agree with you that we should try our best to keep backward
> > > > > compatibility, although our intention is to have lower producer id
> > > > > expiration timeout.
> > > > > So, I think we should keep default to -1 IMO.
> > > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > <jo...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Thanks for taking a look Jason!
> > > > > >
> > > > > > I wondered if we wanted to have a smaller default but wasn't sure
> > > about
> > > > > the
> > > > > > compatibility story -- especially since there is the chance for
> > > > producer
> > > > > > IDs to expire silently.
> > > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > > conflicting
> > > > > > opinions I can go ahead and update the default.
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > <ja...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the KIP. Although I hate seeing new configurations,
> I
> > > > think
> > > > > > this
> > > > > > > is a good change. Combining these timeout behaviors into a
> single
> > > > > > > configuration was definitely a mistake, but we didn't
> anticipate
> > > the
> > > > > > > problem with the producer id cache. I do wonder if we can make
> > the
> > > > > > default
> > > > > > > a bit lower to reduce the chances that users will hit the same
> > > memory
> > > > > > > issues we have seen. After decoupling this configuration from
> > > > > > > transactional.id.expiration.ms, the new timeout just needs to
> > > cover
> > > > > the
> > > > > > > longest duration that a producer might be retrying the same
> > Produce
> > > > > > > request. 7 days seems too high. Although I think it could go a
> > fair
> > > > > even
> > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Bill,
> > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > transactional.id.expiration.ms would also be over the
> delivery
> > > > > > timeout.
> > > > > > > :)
> > > > > > > > Thanks for the +1!
> > > > > > > >
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> bbejeck@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > > question/suggestion
> > > > > > > > > about default values has already been addressed in the
> > > > > > `Compatibility`
> > > > > > > > > section.
> > > > > > > > >
> > > > > > > > > I'm +1 on the KIP.
> > > > > > > > >
> > > > > > > > > -Bill
> > > > > > > > >
> > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Justine,
> > > > > > > > > >
> > > > > > > > > > Thanks for the well written KIP, this looks like it will
> > be a
> > > > > > useful
> > > > > > > > > > addition.
> > > > > > > > > >
> > > > > > > > > > Overall the KIP looks good to me, I have one
> > > question/comment.
> > > > > > > > > >
> > > > > > > > > > You mentioned that setting the `
> producer.id.expiration.ms`
> > > > less
> > > > > > than
> > > > > > > > the
> > > > > > > > > > delivery timeout could lead to duplicates, which makes
> > sense.
> > > > To
> > > > > > > help
> > > > > > > > > > avoid this situation, do we want to consider a default
> > value
> > > > that
> > > > > > is
> > > > > > > > the
> > > > > > > > > > same as the delivery timeout?
> > > > > > > > > >
> > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > >
> > > > > > > > > > Bill
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey all!
> > > > > > > > > >>
> > > > > > > > > >> I'd like to start a discussion on my proposal to
> separate
> > > > > > time-based
> > > > > > > > > >> producer ID expiration from transactional ID expiration
> by
> > > > > > > > introducing a
> > > > > > > > > >> new configuration.
> > > > > > > > > >>
> > > > > > > > > >> The KIP Is pretty small and simple, but will be helpful
> in
> > > > > > > controlling
> > > > > > > > > >> memory usage in brokers -- especially now that by
> default
> > > > > > producers
> > > > > > > > are
> > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > >>
> > > > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > > > >>
> > > > > > > > > >> KIP:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > >>
> > > > > > > > > >> Thanks!
> > > > > > > > > >> Justine
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Luke Chen <sh...@gmail.com>.
+1 for internal configuration to reduce the complexity.
We can externalize it if we have some requirements in the future.

Thanks.
Luke

On Sat, Aug 20, 2022 at 2:13 AM David Jacot <da...@gmail.com> wrote:

> Sounds good. Thanks, Justine.
>
> Le ven. 19 août 2022 à 19:38, Justine Olshan <jolshan@confluent.io.invalid
> >
> a écrit :
>
> > Hi all,
> >
> > Followed up with David and Ismael offline.
> > Ismael explained that we probably don't want to increase complexity and
> > didn't think the value needed to be modified beyond tests. I agree with
> > this so I've decided to go with an internal configuration (with the same
> > default as mentioned above).
> >
> > From a user's perspective, there will be no change. Because this is
> > internal, we decided to keep the name the internal variable used.
> >
> > Let me know if anyone has any further concerns.
> > Thanks,
> > Justine
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by David Jacot <da...@gmail.com>.
Sounds good. Thanks, Justine.

Le ven. 19 août 2022 à 19:38, Justine Olshan <jo...@confluent.io.invalid>
a écrit :

> Hi all,
>
> Followed up with David and Ismael offline.
> Ismael explained that we probably don't want to increase complexity and
> didn't think the value needed to be modified beyond tests. I agree with
> this so I've decided to go with an internal configuration (with the same
> default as mentioned above).
>
> From a user's perspective, there will be no change. Because this is
> internal, we decided to keep the name the internal variable used.
>
> Let me know if anyone has any further concerns.
> Thanks,
> Justine
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi all,

Followed up with David and Ismael offline.
Ismael explained that we probably don't want to increase complexity and
didn't think the value needed to be modified beyond tests. I agree with
this so I've decided to go with an internal configuration (with the same
default as mentioned above).

From a user's perspective, there will be no change. Because this is
internal, we decided to keep the name the internal variable used.

Let me know if anyone has any further concerns.
Thanks,
Justine

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi David,

I chose the name to match the variable name of the existing hardcoded
value. I also thought it was clearer what was happening.

I'm not sure how to follow the pattern above in a way that is not overly
verbose. It would have to be something like
producer.id.remove.expired.producer.id.cleanup.interval.ms (which matches
the "producer.id" prefix of the other added config) or
pid.remove.expired.pid.cleanup.interval.ms.

Both of these seemed repetitive. So there's something like
remove.expired.producer.id.interval.ms? This are starting to get away from
the existing configs though.
Let me know what you think.

Justine

On Thu, Aug 18, 2022 at 8:40 AM David Jacot <dj...@confluent.io.invalid>
wrote:

> Given that we already have
> `transaction.abort.timed.out.transaction.cleanup.interval.ms` and
> `transaction.remove.expired.transaction.cleanup.interval.ms`, it seems
> OK to add another one for our case here. Regarding the name, I would
> follow the pattern that we use for those two existing configs. We can
> define it as an internal config as well. This allows you to change it
> for your integration tests but does not advertise it.
>
> Best,
> David
>
> On Thu, Aug 18, 2022 at 5:34 PM Justine Olshan
> <jo...@confluent.io.invalid> wrote:
> >
> > Hi Ismael,
> >
> > We can do this if we think that an external configuration is not
> necessary.
> > Just wondering, is there a reason we don't want an external configuration
> > here?
> >
> > Thanks,
> > Justine
> >
> > On Wed, Aug 17, 2022 at 12:30 PM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Why does this have to be an external config? We can provide an internal
> > > mechanism to configure this, no?
> > >
> > > Ismael
> > >
> > > On Wed, Aug 17, 2022 at 9:22 AM Justine Olshan
> > > <jo...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Hey all,
> > > > Quick update to this KIP. While working on the PR and tests I
> realized
> > > that
> > > > we have a hardcoded value for how often we clean up producer IDs.
> > > Currently
> > > > this value is 10 minutes and is defined in LogManager.
> > > > I thought for better testing and ease of use of the new
> configuration, we
> > > > should also be able to configure the cleanup interval.
> > > >
> > > > Here is the new configuration I'm hoping to add. I also added it to
> the
> > > > KIP.
> > > > name: producer.id.expiration.check.interval.ms
> > > > description: The interval at which to remove producer IDs that have
> > > expired
> > > > due to producer.id.expiration.ms passing
> > > > default: 600000 (10 minutes)
> > > > valid values: [1,...]
> > > > priority: low
> > > > update-mode: read-only
> > > >
> > > > I left the default as the current hardcoded value to avoid
> disruptions.
> > > If
> > > > there are any issues with this change let me know.
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Fri, Aug 5, 2022 at 1:40 PM Justine Olshan <jo...@confluent.io>
> > > > wrote:
> > > >
> > > > > Awesome. Thanks Tom!
> > > > >
> > > > > I plan to open this KIP vote at the start of next week.
> > > > > Thanks all for the discussion! Let me know if there is anything
> else.
> > > :)
> > > > >
> > > > > Justine
> > > > >
> > > > > On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tb...@redhat.com>
> > > wrote:
> > > > >
> > > > >> Hi Justine,
> > > > >>
> > > > >> That all seems reasonable to me, thanks!
> > > > >>
> > > > >> On Wed, 3 Aug 2022 at 19:14, Justine Olshan
> > > > <jolshan@confluent.io.invalid
> > > > >> >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Tom and Ismael,
> > > > >> >
> > > > >> > 1. Yes, there are definitely many ways to improve this issue
> and I
> > > > plan
> > > > >> to
> > > > >> > write followup KIPs to address some of the larger changes.
> > > > >> > Just wanted to get this simple fix in as a short term measure to
> > > > prevent
> > > > >> > issues with too many producer IDs in the cache. Stay tuned :)
> > > > >> >
> > > > >> > 2. I did have some offline discussion about informing the
> client. I
> > > > >> think
> > > > >> > for this specific KIP the default behavior in practice should
> not
> > > > change
> > > > >> > enough to require this information to go back to the client. In
> > > other
> > > > >> > words, a reasonable configuration should not regress behavior.
> > > > However,
> > > > >> > with the further changes I mention in 1, perhaps this is
> something
> > > we
> > > > >> want
> > > > >> > to do. And yes -- unfortunately the current state of Kafka is no
> > > > longer
> > > > >> > totally consistent with KIP-98. This is something we probably
> want
> > > to
> > > > >> > clarify in the future.
> > > > >> >
> > > > >> > 3. I will update the config to mention it is not dynamic. I
> think
> > > > since
> > > > >> the
> > > > >> > transactional id configuration is read-only, this should be too.
> > > > >> >
> > > > >> > 4. I can update this wording.
> > > > >> >
> > > > >> > 5. I think there are definitely benefits to the name `
> > > > >> > idempotent.pid.expiration.ms` but there are other ways this
> could
> > > > cause
> > > > >> > confusion. And to be clear -- the configuration can expire a
> > > producer
> > > > ID
> > > > >> > for a transactional producer as long as there isn't an ongoing
> > > > >> transaction.
> > > > >> >
> > > > >> > Let me know if you have any questions and thanks for taking a
> look!
> > > > >> >
> > > > >> > Justine
> > > > >> >
> > > > >> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > >> >
> > > > >> > > Regarding 1, more can certainly be done, but I think it would
> be
> > > > >> > > complementary. As such, I think this KIP stands on its own and
> > > > >> additional
> > > > >> > > improvements can be handled via future KIPs (unless Justine
> wants
> > > to
> > > > >> > > combine things, of course).
> > > > >> > >
> > > > >> > > Ismael
> > > > >> > >
> > > > >> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <
> tbentley@redhat.com>
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Hi Justine,
> > > > >> > > >
> > > > >> > > > Thanks for the KIP! I can see that this is a pragmatic
> attempt
> > > to
> > > > >> > > address a
> > > > >> > > > nasty problem. I have a few questions:
> > > > >> > > >
> > > > >> > > > 1. The KIP makes the problem significantly harder to
> trigger,
> > > but
> > > > >> > doesn't
> > > > >> > > > eliminate it entirely. How confident are you that it will be
> > > > >> sufficient
> > > > >> > > in
> > > > >> > > > practice? We can point to applications which are creating
> > > > idempotent
> > > > >> > > > producers at a high rate and say they're broken, but that
> > > doesn't
> > > > do
> > > > >> > > > anything to defend the broker from an interaction pattern
> that
> > > > >> differs
> > > > >> > > only
> > > > >> > > > in rate from a "good application". Did you consider a new
> quota
> > > to
> > > > >> > limit
> > > > >> > > > the rate at which a (principal, clientId) can allocate new
> PIDs?
> > > > >> > > >
> > > > >> > > > 2. The KIP contains this sentence: "when an idempotent
> > > producer’s
> > > > ID
> > > > >> > > > expires, it silently loses its idempotency guarantees."
> That's
> > > at
> > > > >> odds
> > > > >> > > with
> > > > >> > > > my reading of "PID expiration" in the KIP-98 design[1], but
> it
> > > > does
> > > > >> > seem
> > > > >> > > > consistent with a (brief!) look at the code. I accept that
> the
> > > > risk
> > > > >> > > should
> > > > >> > > > be minimal so long as the expiration time is > the
> producer's
> > > > >> delivery
> > > > >> > > > timeout, but it would still be nice if we could detect this
> > > > >> situation
> > > > >> > and
> > > > >> > > > return an error to the client. Is there a reason for the
> > > apparent
> > > > >> > > deviation
> > > > >> > > > from KIP-98 (or am I misreading the code?)
> > > > >> > > >
> > > > >> > > > 3. Could the KIP be explicit on whether the new config will
> be
> > > > >> > > dynamically
> > > > >> > > > changeable?
> > > > >> > > >
> > > > >> > > > 4. The description of producer.id.expiration.ms mentions
> the
> > > > >> > > > ProducerStateManager, which will mean nothing to a normal
> user.
> > > We
> > > > >> > could
> > > > >> > > > probably change it to "a topic partition leader" without
> loss of
> > > > >> > meaning.
> > > > >> > > >
> > > > >> > > > 5. The description also says "Producer IDs will not expire
> > > while a
> > > > >> > > > transaction associated to them is still ongoing." To me this
> > > > >> suggests
> > > > >> > > that
> > > > >> > > > a more intuitive name for this config (from the user PoV)
> would
> > > > >> include
> > > > >> > > > "idempotent", since it does not cover the transactional
> case. (I
> > > > >> would
> > > > >> > > > suggest "idempotent.pid.expiration.ms" (c.f.
> > > > >> > > > transactional.id.expiration.ms),
> > > > >> > > > but the distinction between "id" and "pid" is easily
> missed–even
> > > > if
> > > > >> > it's
> > > > >> > > > technically correct–so I'm not sure it's better than what
> you're
> > > > >> > > > proposing). I only mention it in case it prompts someone
> else to
> > > > >> find a
> > > > >> > > > better name.
> > > > >> > > >
> > > > >> > > > Thanks again,
> > > > >> > > >
> > > > >> > > > Tom
> > > > >> > > >
> > > > >> > > > [1]:
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> > > > >> > > >
> > > > >> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> > > > >> > <jolshan@confluent.io.invalid
> > > > >> > > >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > I've updated the KIP to make the new minimum value 1 and
> > > remove
> > > > >> the
> > > > >> > -1
> > > > >> > > > > configuration.
> > > > >> > > > > I've also added the low priority to the configuration and
> > > edited
> > > > >> the
> > > > >> > > > > description as Ismael mentioned.
> > > > >> > > > >
> > > > >> > > > > I'm thinking about bringing this KIP to a vote soon! Let
> me
> > > know
> > > > >> if
> > > > >> > > there
> > > > >> > > > > are any other comments or questions.
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Justine
> > > > >> > > > >
> > > > >> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > > > >> > > > <jason@confluent.io.invalid
> > > > >> > > > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > I agree with Ismael that we should remove -1. I think we
> > > tend
> > > > to
> > > > >> > view
> > > > >> > > > the
> > > > >> > > > > > coupling of these behaviors into a single configuration
> as a
> > > > >> > mistake,
> > > > >> > > > so
> > > > >> > > > > > it's a little odd to keep it (even if in a weakened
> form).
> > > > >> > > > > >
> > > > >> > > > > > -Jason
> > > > >> > > > > >
> > > > >> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <
> > > > ismael@juma.me.uk>
> > > > >> > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > I would remove -1 altogether. Two more comments:
> > > > >> > > > > > >
> > > > >> > > > > > > 1. The current description kind of leads people
> towards
> > > > >> aligning
> > > > >> > > the
> > > > >> > > > > > config
> > > > >> > > > > > > with delivery.timeout.ms. Is that what we want? We
> could
> > > > say
> > > > >> it
> > > > >> > > > should
> > > > >> > > > > > be
> > > > >> > > > > > > higher than delivery.timeout.ms but indicate that the
> > > > >> default is
> > > > >> > > > > usually
> > > > >> > > > > > > fine. The main reason to reduce it would be to save
> > > memory,
> > > > I
> > > > >> > > guess.
> > > > >> > > > > > > 2. Each config has a priority, we should specify it
> for
> > > this
> > > > >> one.
> > > > >> > > I'm
> > > > >> > > > > > > assuming it will be "low".
> > > > >> > > > > > >
> > > > >> > > > > > > Ismael
> > > > >> > > > > > >
> > > > >> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > > >> > > > > > > <jo...@confluent.io.invalid>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hi all,
> > > > >> > > > > > > >
> > > > >> > > > > > > > I've updated the KIP to include the new default of
> 1 day
> > > > and
> > > > >> > > > > > information
> > > > >> > > > > > > > about -1 in the description of the config.
> > > > >> > > > > > > > I wonder though if including -1 makes sense now
> that it
> > > is
> > > > >> not
> > > > >> > > the
> > > > >> > > > > > > default
> > > > >> > > > > > > > value. Is there a benefit for manually setting -1 vs
> > > > >> manually
> > > > >> > > > setting
> > > > >> > > > > > the
> > > > >> > > > > > > > value that transactional.id.expiration.ms has?
> > > > >> > > > > > > >
> > > > >> > > > > > > > Let me know your thoughts.
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > > Justine
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> > > > >> > ismael@juma.me.uk>
> > > > >> > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > +1 for having 1 day as the default and for
> including
> > > > this
> > > > >> > > change
> > > > >> > > > in
> > > > >> > > > > > the
> > > > >> > > > > > > > > release notes.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Ismael
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > >> > > > > > > > <jason@confluent.io.invalid
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > I don't think a major release is a requirement
> for a
> > > > >> > default
> > > > >> > > > > change
> > > > >> > > > > > > for
> > > > >> > > > > > > > > > what it's worth. I do appreciate that there is a
> > > > >> preference
> > > > >> > > for
> > > > >> > > > > not
> > > > >> > > > > > > > > rocking
> > > > >> > > > > > > > > > the boat though. For a little bit of background
> > > here,
> > > > >> the
> > > > >> > > > problem
> > > > >> > > > > > we
> > > > >> > > > > > > > > > have encountered in production since the
> idempotent
> > > > >> > producer
> > > > >> > > > > became
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > default is OOM errors due to huge numbers of
> > > > producerIds
> > > > >> > that
> > > > >> > > > had
> > > > >> > > > > > to
> > > > >> > > > > > > be
> > > > >> > > > > > > > > > retained in the cache for 7 days. It is hard to
> > > > prevent
> > > > >> use
> > > > >> > > > cases
> > > > >> > > > > > > from
> > > > >> > > > > > > > > > emerging where producers are used and discarded
> > > > >> rapidly. We
> > > > >> > > > will
> > > > >> > > > > be
> > > > >> > > > > > > > > using a
> > > > >> > > > > > > > > > lower value for sure, but it would also be nice
> to
> > > > >> reduce
> > > > >> > the
> > > > >> > > > > > > > likelihood
> > > > >> > > > > > > > > > for the community to see this problem. The
> benefit
> > > of
> > > > >> the
> > > > >> > > > caching
> > > > >> > > > > > > > > > diminishes quickly over time since it is
> primarily
> > > > >> meant to
> > > > >> > > > > handle
> > > > >> > > > > > > > > producer
> > > > >> > > > > > > > > > retry windows. I do not think there is much
> > > difference
> > > > >> > > between
> > > > >> > > > 1
> > > > >> > > > > > days
> > > > >> > > > > > > > > and 7
> > > > >> > > > > > > > > > days from an application perspective, but it is
> a
> > > huge
> > > > >> > > > difference
> > > > >> > > > > > for
> > > > >> > > > > > > > the
> > > > >> > > > > > > > > > broker's memory usage.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Best,
> > > > >> > > > > > > > > > Jason
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > > > >> > > > sagarmeansocean@gmail.com
> > > > >> > > > > >
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Thanks Justine for the KIP. I think it might
> be
> > > > >> better to
> > > > >> > > > > > document
> > > > >> > > > > > > > the
> > > > >> > > > > > > > > > > correlation between the new config and
> > > > >> > delivery.timeout.ms
> > > > >> > > > in
> > > > >> > > > > > the
> > > > >> > > > > > > > > Public
> > > > >> > > > > > > > > > > Interfaces Description.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Also, I agree with Luke that for now setting a
> > > > >> default to
> > > > >> > > -1
> > > > >> > > > > > should
> > > > >> > > > > > > > be
> > > > >> > > > > > > > > > > good. We can look to switch to 1 day with
> major
> > > > >> release.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thanks!
> > > > >> > > > > > > > > > > Sagar.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> > > > >> > > showuon@gmail.com
> > > > >> > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > Hi Justine,
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks for the KIP.
> > > > >> > > > > > > > > > > > I agree with you that we should try our
> best to
> > > > keep
> > > > >> > > > backward
> > > > >> > > > > > > > > > > > compatibility, although our intention is to
> have
> > > > >> lower
> > > > >> > > > > producer
> > > > >> > > > > > > id
> > > > >> > > > > > > > > > > > expiration timeout.
> > > > >> > > > > > > > > > > > So, I think we should keep default to -1
> IMO.
> > > > >> > > > > > > > > > > > Maybe we change the default to 1 day in next
> > > major
> > > > >> > > release
> > > > >> > > > > > (4.0)?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thank you.
> > > > >> > > > > > > > > > > > Luke
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine
> Olshan
> > > > >> > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Thanks for taking a look Jason!
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > I wondered if we wanted to have a smaller
> > > > default
> > > > >> but
> > > > >> > > > > wasn't
> > > > >> > > > > > > sure
> > > > >> > > > > > > > > > about
> > > > >> > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > compatibility story -- especially since
> there
> > > is
> > > > >> the
> > > > >> > > > chance
> > > > >> > > > > > for
> > > > >> > > > > > > > > > > producer
> > > > >> > > > > > > > > > > > > IDs to expire silently.
> > > > >> > > > > > > > > > > > > I do think that 1 day is fairly
> reasonable.
> > > If I
> > > > >> > don't
> > > > >> > > > hear
> > > > >> > > > > > any
> > > > >> > > > > > > > > > > > conflicting
> > > > >> > > > > > > > > > > > > opinions I can go ahead and update the
> > > default.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > Justine
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason
> > > Gustafson
> > > > >> > > > > > > > > > > > > <ja...@confluent.io.invalid>
> > > > >> > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Hi Justine,
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Thanks for the KIP. Although I hate
> seeing
> > > new
> > > > >> > > > > > > configurations,
> > > > >> > > > > > > > I
> > > > >> > > > > > > > > > > think
> > > > >> > > > > > > > > > > > > this
> > > > >> > > > > > > > > > > > > > is a good change. Combining these
> timeout
> > > > >> behaviors
> > > > >> > > > into
> > > > >> > > > > a
> > > > >> > > > > > > > single
> > > > >> > > > > > > > > > > > > > configuration was definitely a mistake,
> but
> > > we
> > > > >> > didn't
> > > > >> > > > > > > > anticipate
> > > > >> > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > problem with the producer id cache. I do
> > > > wonder
> > > > >> if
> > > > >> > we
> > > > >> > > > can
> > > > >> > > > > > > make
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > > default
> > > > >> > > > > > > > > > > > > > a bit lower to reduce the chances that
> users
> > > > >> will
> > > > >> > hit
> > > > >> > > > the
> > > > >> > > > > > > same
> > > > >> > > > > > > > > > memory
> > > > >> > > > > > > > > > > > > > issues we have seen. After decoupling
> this
> > > > >> > > > configuration
> > > > >> > > > > > from
> > > > >> > > > > > > > > > > > > > transactional.id.expiration.ms, the new
> > > > timeout
> > > > >> > just
> > > > >> > > > > needs
> > > > >> > > > > > > to
> > > > >> > > > > > > > > > cover
> > > > >> > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > longest duration that a producer might
> be
> > > > >> retrying
> > > > >> > > the
> > > > >> > > > > same
> > > > >> > > > > > > > > Produce
> > > > >> > > > > > > > > > > > > > request. 7 days seems too high.
> Although I
> > > > >> think it
> > > > >> > > > could
> > > > >> > > > > > go
> > > > >> > > > > > > a
> > > > >> > > > > > > > > fair
> > > > >> > > > > > > > > > > > even
> > > > >> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable
> place
> > > to
> > > > >> > start?
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > > > > Jason
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine
> > > Olshan
> > > > >> > > > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > >> > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Hey Bill,
> > > > >> > > > > > > > > > > > > > > Thanks! I was just going to say that
> > > > hopefully
> > > > >> > > > > > > > > > > > > > > transactional.id.expiration.ms would
> also
> > > > be
> > > > >> > over
> > > > >> > > > the
> > > > >> > > > > > > > delivery
> > > > >> > > > > > > > > > > > > timeout.
> > > > >> > > > > > > > > > > > > > :)
> > > > >> > > > > > > > > > > > > > > Thanks for the +1!
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Justine
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill
> > > Bejeck
> > > > <
> > > > >> > > > > > > > bbejeck@gmail.com
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Hi Justine,
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > I just took another look at the KIP,
> > > and I
> > > > >> > > realize
> > > > >> > > > my
> > > > >> > > > > > > > > > > > > > question/suggestion
> > > > >> > > > > > > > > > > > > > > > about default values has already
> been
> > > > >> addressed
> > > > >> > > in
> > > > >> > > > > the
> > > > >> > > > > > > > > > > > > `Compatibility`
> > > > >> > > > > > > > > > > > > > > > section.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > I'm +1 on the KIP.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > -Bill
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill
> > > > Bejeck
> > > > >> <
> > > > >> > > > > > > > > bbejeck@gmail.com
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Hi Justine,
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Thanks for the well written KIP,
> this
> > > > >> looks
> > > > >> > > like
> > > > >> > > > it
> > > > >> > > > > > > will
> > > > >> > > > > > > > > be a
> > > > >> > > > > > > > > > > > > useful
> > > > >> > > > > > > > > > > > > > > > > addition.
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Overall the KIP looks good to me,
> I
> > > have
> > > > >> one
> > > > >> > > > > > > > > > question/comment.
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > You mentioned that setting the `
> > > > >> > > > > > > > producer.id.expiration.ms`
> > > > >> > > > > > > > > > > less
> > > > >> > > > > > > > > > > > > than
> > > > >> > > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > delivery timeout could lead to
> > > > duplicates,
> > > > >> > > which
> > > > >> > > > > > makes
> > > > >> > > > > > > > > sense.
> > > > >> > > > > > > > > > > To
> > > > >> > > > > > > > > > > > > > help
> > > > >> > > > > > > > > > > > > > > > > avoid this situation, do we want
> to
> > > > >> consider
> > > > >> > a
> > > > >> > > > > > default
> > > > >> > > > > > > > > value
> > > > >> > > > > > > > > > > that
> > > > >> > > > > > > > > > > > > is
> > > > >> > > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > same as the delivery timeout?
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Thanks again for the KIP.
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Bill
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM
> > > Justine
> > > > >> > Olshan
> > > > >> > > > > > > > > > > > > > > > > <jo...@confluent.io.invalid>
> wrote:
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > >> Hey all!
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >> I'd like to start a discussion
> on my
> > > > >> > proposal
> > > > >> > > to
> > > > >> > > > > > > > separate
> > > > >> > > > > > > > > > > > > time-based
> > > > >> > > > > > > > > > > > > > > > >> producer ID expiration from
> > > > >> transactional ID
> > > > >> > > > > > > expiration
> > > > >> > > > > > > > by
> > > > >> > > > > > > > > > > > > > > introducing a
> > > > >> > > > > > > > > > > > > > > > >> new configuration.
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >> The KIP Is pretty small and
> simple,
> > > but
> > > > >> will
> > > > >> > > be
> > > > >> > > > > > > helpful
> > > > >> > > > > > > > in
> > > > >> > > > > > > > > > > > > > controlling
> > > > >> > > > > > > > > > > > > > > > >> memory usage in brokers --
> especially
> > > > now
> > > > >> > that
> > > > >> > > > by
> > > > >> > > > > > > > default
> > > > >> > > > > > > > > > > > > producers
> > > > >> > > > > > > > > > > > > > > are
> > > > >> > > > > > > > > > > > > > > > >> idempotent and create producer ID
> > > > state.
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >> Please take a look and leave any
> > > > comments
> > > > >> > you
> > > > >> > > > may
> > > > >> > > > > > > have!
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >> KIP:
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > >> > > > > > > > > > > > > > > > >> JIRA:
> > > > >> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >> Thanks!
> > > > >> > > > > > > > > > > > > > > > >> Justine
> > > > >> > > > > > > > > > > > > > > > >>
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by David Jacot <dj...@confluent.io.INVALID>.
Given that we already have
`transaction.abort.timed.out.transaction.cleanup.interval.ms` and
`transaction.remove.expired.transaction.cleanup.interval.ms`, it seems
OK to add another one for our case here. Regarding the name, I would
follow the pattern that we use for those two existing configs. We can
define it as an internal config as well. This allows you to change it
for your integration tests but does not advertise it.

Best,
David

On Thu, Aug 18, 2022 at 5:34 PM Justine Olshan
<jo...@confluent.io.invalid> wrote:
>
> Hi Ismael,
>
> We can do this if we think that an external configuration is not necessary.
> Just wondering, is there a reason we don't want an external configuration
> here?
>
> Thanks,
> Justine
>
> On Wed, Aug 17, 2022 at 12:30 PM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Why does this have to be an external config? We can provide an internal
> > mechanism to configure this, no?
> >
> > Ismael
> >
> > On Wed, Aug 17, 2022 at 9:22 AM Justine Olshan
> > <jo...@confluent.io.invalid>
> > wrote:
> >
> > > Hey all,
> > > Quick update to this KIP. While working on the PR and tests I realized
> > that
> > > we have a hardcoded value for how often we clean up producer IDs.
> > Currently
> > > this value is 10 minutes and is defined in LogManager.
> > > I thought for better testing and ease of use of the new configuration, we
> > > should also be able to configure the cleanup interval.
> > >
> > > Here is the new configuration I'm hoping to add. I also added it to the
> > > KIP.
> > > name: producer.id.expiration.check.interval.ms
> > > description: The interval at which to remove producer IDs that have
> > expired
> > > due to producer.id.expiration.ms passing
> > > default: 600000 (10 minutes)
> > > valid values: [1,...]
> > > priority: low
> > > update-mode: read-only
> > >
> > > I left the default as the current hardcoded value to avoid disruptions.
> > If
> > > there are any issues with this change let me know.
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Aug 5, 2022 at 1:40 PM Justine Olshan <jo...@confluent.io>
> > > wrote:
> > >
> > > > Awesome. Thanks Tom!
> > > >
> > > > I plan to open this KIP vote at the start of next week.
> > > > Thanks all for the discussion! Let me know if there is anything else.
> > :)
> > > >
> > > > Justine
> > > >
> > > > On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tb...@redhat.com>
> > wrote:
> > > >
> > > >> Hi Justine,
> > > >>
> > > >> That all seems reasonable to me, thanks!
> > > >>
> > > >> On Wed, 3 Aug 2022 at 19:14, Justine Olshan
> > > <jolshan@confluent.io.invalid
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > Hi Tom and Ismael,
> > > >> >
> > > >> > 1. Yes, there are definitely many ways to improve this issue and I
> > > plan
> > > >> to
> > > >> > write followup KIPs to address some of the larger changes.
> > > >> > Just wanted to get this simple fix in as a short term measure to
> > > prevent
> > > >> > issues with too many producer IDs in the cache. Stay tuned :)
> > > >> >
> > > >> > 2. I did have some offline discussion about informing the client. I
> > > >> think
> > > >> > for this specific KIP the default behavior in practice should not
> > > change
> > > >> > enough to require this information to go back to the client. In
> > other
> > > >> > words, a reasonable configuration should not regress behavior.
> > > However,
> > > >> > with the further changes I mention in 1, perhaps this is something
> > we
> > > >> want
> > > >> > to do. And yes -- unfortunately the current state of Kafka is no
> > > longer
> > > >> > totally consistent with KIP-98. This is something we probably want
> > to
> > > >> > clarify in the future.
> > > >> >
> > > >> > 3. I will update the config to mention it is not dynamic. I think
> > > since
> > > >> the
> > > >> > transactional id configuration is read-only, this should be too.
> > > >> >
> > > >> > 4. I can update this wording.
> > > >> >
> > > >> > 5. I think there are definitely benefits to the name `
> > > >> > idempotent.pid.expiration.ms` but there are other ways this could
> > > cause
> > > >> > confusion. And to be clear -- the configuration can expire a
> > producer
> > > ID
> > > >> > for a transactional producer as long as there isn't an ongoing
> > > >> transaction.
> > > >> >
> > > >> > Let me know if you have any questions and thanks for taking a look!
> > > >> >
> > > >> > Justine
> > > >> >
> > > >> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >> >
> > > >> > > Regarding 1, more can certainly be done, but I think it would be
> > > >> > > complementary. As such, I think this KIP stands on its own and
> > > >> additional
> > > >> > > improvements can be handled via future KIPs (unless Justine wants
> > to
> > > >> > > combine things, of course).
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Hi Justine,
> > > >> > > >
> > > >> > > > Thanks for the KIP! I can see that this is a pragmatic attempt
> > to
> > > >> > > address a
> > > >> > > > nasty problem. I have a few questions:
> > > >> > > >
> > > >> > > > 1. The KIP makes the problem significantly harder to trigger,
> > but
> > > >> > doesn't
> > > >> > > > eliminate it entirely. How confident are you that it will be
> > > >> sufficient
> > > >> > > in
> > > >> > > > practice? We can point to applications which are creating
> > > idempotent
> > > >> > > > producers at a high rate and say they're broken, but that
> > doesn't
> > > do
> > > >> > > > anything to defend the broker from an interaction pattern that
> > > >> differs
> > > >> > > only
> > > >> > > > in rate from a "good application". Did you consider a new quota
> > to
> > > >> > limit
> > > >> > > > the rate at which a (principal, clientId) can allocate new PIDs?
> > > >> > > >
> > > >> > > > 2. The KIP contains this sentence: "when an idempotent
> > producer’s
> > > ID
> > > >> > > > expires, it silently loses its idempotency guarantees." That's
> > at
> > > >> odds
> > > >> > > with
> > > >> > > > my reading of "PID expiration" in the KIP-98 design[1], but it
> > > does
> > > >> > seem
> > > >> > > > consistent with a (brief!) look at the code. I accept that the
> > > risk
> > > >> > > should
> > > >> > > > be minimal so long as the expiration time is > the producer's
> > > >> delivery
> > > >> > > > timeout, but it would still be nice if we could detect this
> > > >> situation
> > > >> > and
> > > >> > > > return an error to the client. Is there a reason for the
> > apparent
> > > >> > > deviation
> > > >> > > > from KIP-98 (or am I misreading the code?)
> > > >> > > >
> > > >> > > > 3. Could the KIP be explicit on whether the new config will be
> > > >> > > dynamically
> > > >> > > > changeable?
> > > >> > > >
> > > >> > > > 4. The description of producer.id.expiration.ms mentions the
> > > >> > > > ProducerStateManager, which will mean nothing to a normal user.
> > We
> > > >> > could
> > > >> > > > probably change it to "a topic partition leader" without loss of
> > > >> > meaning.
> > > >> > > >
> > > >> > > > 5. The description also says "Producer IDs will not expire
> > while a
> > > >> > > > transaction associated to them is still ongoing." To me this
> > > >> suggests
> > > >> > > that
> > > >> > > > a more intuitive name for this config (from the user PoV) would
> > > >> include
> > > >> > > > "idempotent", since it does not cover the transactional case. (I
> > > >> would
> > > >> > > > suggest "idempotent.pid.expiration.ms" (c.f.
> > > >> > > > transactional.id.expiration.ms),
> > > >> > > > but the distinction between "id" and "pid" is easily missed–even
> > > if
> > > >> > it's
> > > >> > > > technically correct–so I'm not sure it's better than what you're
> > > >> > > > proposing). I only mention it in case it prompts someone else to
> > > >> find a
> > > >> > > > better name.
> > > >> > > >
> > > >> > > > Thanks again,
> > > >> > > >
> > > >> > > > Tom
> > > >> > > >
> > > >> > > > [1]:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> > https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> > > >> > > >
> > > >> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> > > >> > <jolshan@confluent.io.invalid
> > > >> > > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > I've updated the KIP to make the new minimum value 1 and
> > remove
> > > >> the
> > > >> > -1
> > > >> > > > > configuration.
> > > >> > > > > I've also added the low priority to the configuration and
> > edited
> > > >> the
> > > >> > > > > description as Ismael mentioned.
> > > >> > > > >
> > > >> > > > > I'm thinking about bringing this KIP to a vote soon! Let me
> > know
> > > >> if
> > > >> > > there
> > > >> > > > > are any other comments or questions.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Justine
> > > >> > > > >
> > > >> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > > >> > > > <jason@confluent.io.invalid
> > > >> > > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > I agree with Ismael that we should remove -1. I think we
> > tend
> > > to
> > > >> > view
> > > >> > > > the
> > > >> > > > > > coupling of these behaviors into a single configuration as a
> > > >> > mistake,
> > > >> > > > so
> > > >> > > > > > it's a little odd to keep it (even if in a weakened form).
> > > >> > > > > >
> > > >> > > > > > -Jason
> > > >> > > > > >
> > > >> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <
> > > ismael@juma.me.uk>
> > > >> > > wrote:
> > > >> > > > > >
> > > >> > > > > > > I would remove -1 altogether. Two more comments:
> > > >> > > > > > >
> > > >> > > > > > > 1. The current description kind of leads people towards
> > > >> aligning
> > > >> > > the
> > > >> > > > > > config
> > > >> > > > > > > with delivery.timeout.ms. Is that what we want? We could
> > > say
> > > >> it
> > > >> > > > should
> > > >> > > > > > be
> > > >> > > > > > > higher than delivery.timeout.ms but indicate that the
> > > >> default is
> > > >> > > > > usually
> > > >> > > > > > > fine. The main reason to reduce it would be to save
> > memory,
> > > I
> > > >> > > guess.
> > > >> > > > > > > 2. Each config has a priority, we should specify it for
> > this
> > > >> one.
> > > >> > > I'm
> > > >> > > > > > > assuming it will be "low".
> > > >> > > > > > >
> > > >> > > > > > > Ismael
> > > >> > > > > > >
> > > >> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > >> > > > > > > <jo...@confluent.io.invalid>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hi all,
> > > >> > > > > > > >
> > > >> > > > > > > > I've updated the KIP to include the new default of 1 day
> > > and
> > > >> > > > > > information
> > > >> > > > > > > > about -1 in the description of the config.
> > > >> > > > > > > > I wonder though if including -1 makes sense now that it
> > is
> > > >> not
> > > >> > > the
> > > >> > > > > > > default
> > > >> > > > > > > > value. Is there a benefit for manually setting -1 vs
> > > >> manually
> > > >> > > > setting
> > > >> > > > > > the
> > > >> > > > > > > > value that transactional.id.expiration.ms has?
> > > >> > > > > > > >
> > > >> > > > > > > > Let me know your thoughts.
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Justine
> > > >> > > > > > > >
> > > >> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> > > >> > ismael@juma.me.uk>
> > > >> > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > +1 for having 1 day as the default and for including
> > > this
> > > >> > > change
> > > >> > > > in
> > > >> > > > > > the
> > > >> > > > > > > > > release notes.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Ismael
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > >> > > > > > > > <jason@confluent.io.invalid
> > > >> > > > > > > > > >
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > I don't think a major release is a requirement for a
> > > >> > default
> > > >> > > > > change
> > > >> > > > > > > for
> > > >> > > > > > > > > > what it's worth. I do appreciate that there is a
> > > >> preference
> > > >> > > for
> > > >> > > > > not
> > > >> > > > > > > > > rocking
> > > >> > > > > > > > > > the boat though. For a little bit of background
> > here,
> > > >> the
> > > >> > > > problem
> > > >> > > > > > we
> > > >> > > > > > > > > > have encountered in production since the idempotent
> > > >> > producer
> > > >> > > > > became
> > > >> > > > > > > the
> > > >> > > > > > > > > > default is OOM errors due to huge numbers of
> > > producerIds
> > > >> > that
> > > >> > > > had
> > > >> > > > > > to
> > > >> > > > > > > be
> > > >> > > > > > > > > > retained in the cache for 7 days. It is hard to
> > > prevent
> > > >> use
> > > >> > > > cases
> > > >> > > > > > > from
> > > >> > > > > > > > > > emerging where producers are used and discarded
> > > >> rapidly. We
> > > >> > > > will
> > > >> > > > > be
> > > >> > > > > > > > > using a
> > > >> > > > > > > > > > lower value for sure, but it would also be nice to
> > > >> reduce
> > > >> > the
> > > >> > > > > > > > likelihood
> > > >> > > > > > > > > > for the community to see this problem. The benefit
> > of
> > > >> the
> > > >> > > > caching
> > > >> > > > > > > > > > diminishes quickly over time since it is primarily
> > > >> meant to
> > > >> > > > > handle
> > > >> > > > > > > > > producer
> > > >> > > > > > > > > > retry windows. I do not think there is much
> > difference
> > > >> > > between
> > > >> > > > 1
> > > >> > > > > > days
> > > >> > > > > > > > > and 7
> > > >> > > > > > > > > > days from an application perspective, but it is a
> > huge
> > > >> > > > difference
> > > >> > > > > > for
> > > >> > > > > > > > the
> > > >> > > > > > > > > > broker's memory usage.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Best,
> > > >> > > > > > > > > > Jason
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > > >> > > > sagarmeansocean@gmail.com
> > > >> > > > > >
> > > >> > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Thanks Justine for the KIP. I think it might be
> > > >> better to
> > > >> > > > > > document
> > > >> > > > > > > > the
> > > >> > > > > > > > > > > correlation between the new config and
> > > >> > delivery.timeout.ms
> > > >> > > > in
> > > >> > > > > > the
> > > >> > > > > > > > > Public
> > > >> > > > > > > > > > > Interfaces Description.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Also, I agree with Luke that for now setting a
> > > >> default to
> > > >> > > -1
> > > >> > > > > > should
> > > >> > > > > > > > be
> > > >> > > > > > > > > > > good. We can look to switch to 1 day with major
> > > >> release.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Thanks!
> > > >> > > > > > > > > > > Sagar.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> > > >> > > showuon@gmail.com
> > > >> > > > >
> > > >> > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > Hi Justine,
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Thanks for the KIP.
> > > >> > > > > > > > > > > > I agree with you that we should try our best to
> > > keep
> > > >> > > > backward
> > > >> > > > > > > > > > > > compatibility, although our intention is to have
> > > >> lower
> > > >> > > > > producer
> > > >> > > > > > > id
> > > >> > > > > > > > > > > > expiration timeout.
> > > >> > > > > > > > > > > > So, I think we should keep default to -1 IMO.
> > > >> > > > > > > > > > > > Maybe we change the default to 1 day in next
> > major
> > > >> > > release
> > > >> > > > > > (4.0)?
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > Thank you.
> > > >> > > > > > > > > > > > Luke
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > >> > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Thanks for taking a look Jason!
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > I wondered if we wanted to have a smaller
> > > default
> > > >> but
> > > >> > > > > wasn't
> > > >> > > > > > > sure
> > > >> > > > > > > > > > about
> > > >> > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > compatibility story -- especially since there
> > is
> > > >> the
> > > >> > > > chance
> > > >> > > > > > for
> > > >> > > > > > > > > > > producer
> > > >> > > > > > > > > > > > > IDs to expire silently.
> > > >> > > > > > > > > > > > > I do think that 1 day is fairly reasonable.
> > If I
> > > >> > don't
> > > >> > > > hear
> > > >> > > > > > any
> > > >> > > > > > > > > > > > conflicting
> > > >> > > > > > > > > > > > > opinions I can go ahead and update the
> > default.
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > Justine
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason
> > Gustafson
> > > >> > > > > > > > > > > > > <ja...@confluent.io.invalid>
> > > >> > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Hi Justine,
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing
> > new
> > > >> > > > > > > configurations,
> > > >> > > > > > > > I
> > > >> > > > > > > > > > > think
> > > >> > > > > > > > > > > > > this
> > > >> > > > > > > > > > > > > > is a good change. Combining these timeout
> > > >> behaviors
> > > >> > > > into
> > > >> > > > > a
> > > >> > > > > > > > single
> > > >> > > > > > > > > > > > > > configuration was definitely a mistake, but
> > we
> > > >> > didn't
> > > >> > > > > > > > anticipate
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > > > > > problem with the producer id cache. I do
> > > wonder
> > > >> if
> > > >> > we
> > > >> > > > can
> > > >> > > > > > > make
> > > >> > > > > > > > > the
> > > >> > > > > > > > > > > > > default
> > > >> > > > > > > > > > > > > > a bit lower to reduce the chances that users
> > > >> will
> > > >> > hit
> > > >> > > > the
> > > >> > > > > > > same
> > > >> > > > > > > > > > memory
> > > >> > > > > > > > > > > > > > issues we have seen. After decoupling this
> > > >> > > > configuration
> > > >> > > > > > from
> > > >> > > > > > > > > > > > > > transactional.id.expiration.ms, the new
> > > timeout
> > > >> > just
> > > >> > > > > needs
> > > >> > > > > > > to
> > > >> > > > > > > > > > cover
> > > >> > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > longest duration that a producer might be
> > > >> retrying
> > > >> > > the
> > > >> > > > > same
> > > >> > > > > > > > > Produce
> > > >> > > > > > > > > > > > > > request. 7 days seems too high. Although I
> > > >> think it
> > > >> > > > could
> > > >> > > > > > go
> > > >> > > > > > > a
> > > >> > > > > > > > > fair
> > > >> > > > > > > > > > > > even
> > > >> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place
> > to
> > > >> > start?
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > > > > > Jason
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine
> > Olshan
> > > >> > > > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > >> > > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Hey Bill,
> > > >> > > > > > > > > > > > > > > Thanks! I was just going to say that
> > > hopefully
> > > >> > > > > > > > > > > > > > > transactional.id.expiration.ms would also
> > > be
> > > >> > over
> > > >> > > > the
> > > >> > > > > > > > delivery
> > > >> > > > > > > > > > > > > timeout.
> > > >> > > > > > > > > > > > > > :)
> > > >> > > > > > > > > > > > > > > Thanks for the +1!
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > Justine
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill
> > Bejeck
> > > <
> > > >> > > > > > > > bbejeck@gmail.com
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > Hi Justine,
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > I just took another look at the KIP,
> > and I
> > > >> > > realize
> > > >> > > > my
> > > >> > > > > > > > > > > > > > question/suggestion
> > > >> > > > > > > > > > > > > > > > about default values has already been
> > > >> addressed
> > > >> > > in
> > > >> > > > > the
> > > >> > > > > > > > > > > > > `Compatibility`
> > > >> > > > > > > > > > > > > > > > section.
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > I'm +1 on the KIP.
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > -Bill
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill
> > > Bejeck
> > > >> <
> > > >> > > > > > > > > bbejeck@gmail.com
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > > > wrote:
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > Hi Justine,
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > Thanks for the well written KIP, this
> > > >> looks
> > > >> > > like
> > > >> > > > it
> > > >> > > > > > > will
> > > >> > > > > > > > > be a
> > > >> > > > > > > > > > > > > useful
> > > >> > > > > > > > > > > > > > > > > addition.
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > Overall the KIP looks good to me, I
> > have
> > > >> one
> > > >> > > > > > > > > > question/comment.
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > You mentioned that setting the `
> > > >> > > > > > > > producer.id.expiration.ms`
> > > >> > > > > > > > > > > less
> > > >> > > > > > > > > > > > > than
> > > >> > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > delivery timeout could lead to
> > > duplicates,
> > > >> > > which
> > > >> > > > > > makes
> > > >> > > > > > > > > sense.
> > > >> > > > > > > > > > > To
> > > >> > > > > > > > > > > > > > help
> > > >> > > > > > > > > > > > > > > > > avoid this situation, do we want to
> > > >> consider
> > > >> > a
> > > >> > > > > > default
> > > >> > > > > > > > > value
> > > >> > > > > > > > > > > that
> > > >> > > > > > > > > > > > > is
> > > >> > > > > > > > > > > > > > > the
> > > >> > > > > > > > > > > > > > > > > same as the delivery timeout?
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > Thanks again for the KIP.
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > Bill
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM
> > Justine
> > > >> > Olshan
> > > >> > > > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > > >> Hey all!
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >> I'd like to start a discussion on my
> > > >> > proposal
> > > >> > > to
> > > >> > > > > > > > separate
> > > >> > > > > > > > > > > > > time-based
> > > >> > > > > > > > > > > > > > > > >> producer ID expiration from
> > > >> transactional ID
> > > >> > > > > > > expiration
> > > >> > > > > > > > by
> > > >> > > > > > > > > > > > > > > introducing a
> > > >> > > > > > > > > > > > > > > > >> new configuration.
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >> The KIP Is pretty small and simple,
> > but
> > > >> will
> > > >> > > be
> > > >> > > > > > > helpful
> > > >> > > > > > > > in
> > > >> > > > > > > > > > > > > > controlling
> > > >> > > > > > > > > > > > > > > > >> memory usage in brokers -- especially
> > > now
> > > >> > that
> > > >> > > > by
> > > >> > > > > > > > default
> > > >> > > > > > > > > > > > > producers
> > > >> > > > > > > > > > > > > > > are
> > > >> > > > > > > > > > > > > > > > >> idempotent and create producer ID
> > > state.
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >> Please take a look and leave any
> > > comments
> > > >> > you
> > > >> > > > may
> > > >> > > > > > > have!
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >> KIP:
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > >> > > > > > > > > > > > > > > > >> JIRA:
> > > >> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >> Thanks!
> > > >> > > > > > > > > > > > > > > > >> Justine
> > > >> > > > > > > > > > > > > > > > >>
> > > >> > > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > > >
> > > >> > > > > > > > > > > > >
> > > >> > > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi Ismael,

We can do this if we think that an external configuration is not necessary.
Just wondering, is there a reason we don't want an external configuration
here?

Thanks,
Justine

On Wed, Aug 17, 2022 at 12:30 PM Ismael Juma <is...@juma.me.uk> wrote:

> Why does this have to be an external config? We can provide an internal
> mechanism to configure this, no?
>
> Ismael
>
> On Wed, Aug 17, 2022 at 9:22 AM Justine Olshan
> <jo...@confluent.io.invalid>
> wrote:
>
> > Hey all,
> > Quick update to this KIP. While working on the PR and tests I realized
> that
> > we have a hardcoded value for how often we clean up producer IDs.
> Currently
> > this value is 10 minutes and is defined in LogManager.
> > I thought for better testing and ease of use of the new configuration, we
> > should also be able to configure the cleanup interval.
> >
> > Here is the new configuration I'm hoping to add. I also added it to the
> > KIP.
> > name: producer.id.expiration.check.interval.ms
> > description: The interval at which to remove producer IDs that have
> expired
> > due to producer.id.expiration.ms passing
> > default: 600000 (10 minutes)
> > valid values: [1,...]
> > priority: low
> > update-mode: read-only
> >
> > I left the default as the current hardcoded value to avoid disruptions.
> If
> > there are any issues with this change let me know.
> > Thanks,
> > Justine
> >
> > On Fri, Aug 5, 2022 at 1:40 PM Justine Olshan <jo...@confluent.io>
> > wrote:
> >
> > > Awesome. Thanks Tom!
> > >
> > > I plan to open this KIP vote at the start of next week.
> > > Thanks all for the discussion! Let me know if there is anything else.
> :)
> > >
> > > Justine
> > >
> > > On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tb...@redhat.com>
> wrote:
> > >
> > >> Hi Justine,
> > >>
> > >> That all seems reasonable to me, thanks!
> > >>
> > >> On Wed, 3 Aug 2022 at 19:14, Justine Olshan
> > <jolshan@confluent.io.invalid
> > >> >
> > >> wrote:
> > >>
> > >> > Hi Tom and Ismael,
> > >> >
> > >> > 1. Yes, there are definitely many ways to improve this issue and I
> > plan
> > >> to
> > >> > write followup KIPs to address some of the larger changes.
> > >> > Just wanted to get this simple fix in as a short term measure to
> > prevent
> > >> > issues with too many producer IDs in the cache. Stay tuned :)
> > >> >
> > >> > 2. I did have some offline discussion about informing the client. I
> > >> think
> > >> > for this specific KIP the default behavior in practice should not
> > change
> > >> > enough to require this information to go back to the client. In
> other
> > >> > words, a reasonable configuration should not regress behavior.
> > However,
> > >> > with the further changes I mention in 1, perhaps this is something
> we
> > >> want
> > >> > to do. And yes -- unfortunately the current state of Kafka is no
> > longer
> > >> > totally consistent with KIP-98. This is something we probably want
> to
> > >> > clarify in the future.
> > >> >
> > >> > 3. I will update the config to mention it is not dynamic. I think
> > since
> > >> the
> > >> > transactional id configuration is read-only, this should be too.
> > >> >
> > >> > 4. I can update this wording.
> > >> >
> > >> > 5. I think there are definitely benefits to the name `
> > >> > idempotent.pid.expiration.ms` but there are other ways this could
> > cause
> > >> > confusion. And to be clear -- the configuration can expire a
> producer
> > ID
> > >> > for a transactional producer as long as there isn't an ongoing
> > >> transaction.
> > >> >
> > >> > Let me know if you have any questions and thanks for taking a look!
> > >> >
> > >> > Justine
> > >> >
> > >> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> > >> >
> > >> > > Regarding 1, more can certainly be done, but I think it would be
> > >> > > complementary. As such, I think this KIP stands on its own and
> > >> additional
> > >> > > improvements can be handled via future KIPs (unless Justine wants
> to
> > >> > > combine things, of course).
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com>
> > >> wrote:
> > >> > >
> > >> > > > Hi Justine,
> > >> > > >
> > >> > > > Thanks for the KIP! I can see that this is a pragmatic attempt
> to
> > >> > > address a
> > >> > > > nasty problem. I have a few questions:
> > >> > > >
> > >> > > > 1. The KIP makes the problem significantly harder to trigger,
> but
> > >> > doesn't
> > >> > > > eliminate it entirely. How confident are you that it will be
> > >> sufficient
> > >> > > in
> > >> > > > practice? We can point to applications which are creating
> > idempotent
> > >> > > > producers at a high rate and say they're broken, but that
> doesn't
> > do
> > >> > > > anything to defend the broker from an interaction pattern that
> > >> differs
> > >> > > only
> > >> > > > in rate from a "good application". Did you consider a new quota
> to
> > >> > limit
> > >> > > > the rate at which a (principal, clientId) can allocate new PIDs?
> > >> > > >
> > >> > > > 2. The KIP contains this sentence: "when an idempotent
> producer’s
> > ID
> > >> > > > expires, it silently loses its idempotency guarantees." That's
> at
> > >> odds
> > >> > > with
> > >> > > > my reading of "PID expiration" in the KIP-98 design[1], but it
> > does
> > >> > seem
> > >> > > > consistent with a (brief!) look at the code. I accept that the
> > risk
> > >> > > should
> > >> > > > be minimal so long as the expiration time is > the producer's
> > >> delivery
> > >> > > > timeout, but it would still be nice if we could detect this
> > >> situation
> > >> > and
> > >> > > > return an error to the client. Is there a reason for the
> apparent
> > >> > > deviation
> > >> > > > from KIP-98 (or am I misreading the code?)
> > >> > > >
> > >> > > > 3. Could the KIP be explicit on whether the new config will be
> > >> > > dynamically
> > >> > > > changeable?
> > >> > > >
> > >> > > > 4. The description of producer.id.expiration.ms mentions the
> > >> > > > ProducerStateManager, which will mean nothing to a normal user.
> We
> > >> > could
> > >> > > > probably change it to "a topic partition leader" without loss of
> > >> > meaning.
> > >> > > >
> > >> > > > 5. The description also says "Producer IDs will not expire
> while a
> > >> > > > transaction associated to them is still ongoing." To me this
> > >> suggests
> > >> > > that
> > >> > > > a more intuitive name for this config (from the user PoV) would
> > >> include
> > >> > > > "idempotent", since it does not cover the transactional case. (I
> > >> would
> > >> > > > suggest "idempotent.pid.expiration.ms" (c.f.
> > >> > > > transactional.id.expiration.ms),
> > >> > > > but the distinction between "id" and "pid" is easily missed–even
> > if
> > >> > it's
> > >> > > > technically correct–so I'm not sure it's better than what you're
> > >> > > > proposing). I only mention it in case it prompts someone else to
> > >> find a
> > >> > > > better name.
> > >> > > >
> > >> > > > Thanks again,
> > >> > > >
> > >> > > > Tom
> > >> > > >
> > >> > > > [1]:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> > >> > > >
> > >> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> > >> > <jolshan@confluent.io.invalid
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > I've updated the KIP to make the new minimum value 1 and
> remove
> > >> the
> > >> > -1
> > >> > > > > configuration.
> > >> > > > > I've also added the low priority to the configuration and
> edited
> > >> the
> > >> > > > > description as Ismael mentioned.
> > >> > > > >
> > >> > > > > I'm thinking about bringing this KIP to a vote soon! Let me
> know
> > >> if
> > >> > > there
> > >> > > > > are any other comments or questions.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Justine
> > >> > > > >
> > >> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > >> > > > <jason@confluent.io.invalid
> > >> > > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > I agree with Ismael that we should remove -1. I think we
> tend
> > to
> > >> > view
> > >> > > > the
> > >> > > > > > coupling of these behaviors into a single configuration as a
> > >> > mistake,
> > >> > > > so
> > >> > > > > > it's a little odd to keep it (even if in a weakened form).
> > >> > > > > >
> > >> > > > > > -Jason
> > >> > > > > >
> > >> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <
> > ismael@juma.me.uk>
> > >> > > wrote:
> > >> > > > > >
> > >> > > > > > > I would remove -1 altogether. Two more comments:
> > >> > > > > > >
> > >> > > > > > > 1. The current description kind of leads people towards
> > >> aligning
> > >> > > the
> > >> > > > > > config
> > >> > > > > > > with delivery.timeout.ms. Is that what we want? We could
> > say
> > >> it
> > >> > > > should
> > >> > > > > > be
> > >> > > > > > > higher than delivery.timeout.ms but indicate that the
> > >> default is
> > >> > > > > usually
> > >> > > > > > > fine. The main reason to reduce it would be to save
> memory,
> > I
> > >> > > guess.
> > >> > > > > > > 2. Each config has a priority, we should specify it for
> this
> > >> one.
> > >> > > I'm
> > >> > > > > > > assuming it will be "low".
> > >> > > > > > >
> > >> > > > > > > Ismael
> > >> > > > > > >
> > >> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > >> > > > > > > <jo...@confluent.io.invalid>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi all,
> > >> > > > > > > >
> > >> > > > > > > > I've updated the KIP to include the new default of 1 day
> > and
> > >> > > > > > information
> > >> > > > > > > > about -1 in the description of the config.
> > >> > > > > > > > I wonder though if including -1 makes sense now that it
> is
> > >> not
> > >> > > the
> > >> > > > > > > default
> > >> > > > > > > > value. Is there a benefit for manually setting -1 vs
> > >> manually
> > >> > > > setting
> > >> > > > > > the
> > >> > > > > > > > value that transactional.id.expiration.ms has?
> > >> > > > > > > >
> > >> > > > > > > > Let me know your thoughts.
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Justine
> > >> > > > > > > >
> > >> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> > >> > ismael@juma.me.uk>
> > >> > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > +1 for having 1 day as the default and for including
> > this
> > >> > > change
> > >> > > > in
> > >> > > > > > the
> > >> > > > > > > > > release notes.
> > >> > > > > > > > >
> > >> > > > > > > > > Ismael
> > >> > > > > > > > >
> > >> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > >> > > > > > > > <jason@confluent.io.invalid
> > >> > > > > > > > > >
> > >> > > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > I don't think a major release is a requirement for a
> > >> > default
> > >> > > > > change
> > >> > > > > > > for
> > >> > > > > > > > > > what it's worth. I do appreciate that there is a
> > >> preference
> > >> > > for
> > >> > > > > not
> > >> > > > > > > > > rocking
> > >> > > > > > > > > > the boat though. For a little bit of background
> here,
> > >> the
> > >> > > > problem
> > >> > > > > > we
> > >> > > > > > > > > > have encountered in production since the idempotent
> > >> > producer
> > >> > > > > became
> > >> > > > > > > the
> > >> > > > > > > > > > default is OOM errors due to huge numbers of
> > producerIds
> > >> > that
> > >> > > > had
> > >> > > > > > to
> > >> > > > > > > be
> > >> > > > > > > > > > retained in the cache for 7 days. It is hard to
> > prevent
> > >> use
> > >> > > > cases
> > >> > > > > > > from
> > >> > > > > > > > > > emerging where producers are used and discarded
> > >> rapidly. We
> > >> > > > will
> > >> > > > > be
> > >> > > > > > > > > using a
> > >> > > > > > > > > > lower value for sure, but it would also be nice to
> > >> reduce
> > >> > the
> > >> > > > > > > > likelihood
> > >> > > > > > > > > > for the community to see this problem. The benefit
> of
> > >> the
> > >> > > > caching
> > >> > > > > > > > > > diminishes quickly over time since it is primarily
> > >> meant to
> > >> > > > > handle
> > >> > > > > > > > > producer
> > >> > > > > > > > > > retry windows. I do not think there is much
> difference
> > >> > > between
> > >> > > > 1
> > >> > > > > > days
> > >> > > > > > > > > and 7
> > >> > > > > > > > > > days from an application perspective, but it is a
> huge
> > >> > > > difference
> > >> > > > > > for
> > >> > > > > > > > the
> > >> > > > > > > > > > broker's memory usage.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > Jason
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > >> > > > sagarmeansocean@gmail.com
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Thanks Justine for the KIP. I think it might be
> > >> better to
> > >> > > > > > document
> > >> > > > > > > > the
> > >> > > > > > > > > > > correlation between the new config and
> > >> > delivery.timeout.ms
> > >> > > > in
> > >> > > > > > the
> > >> > > > > > > > > Public
> > >> > > > > > > > > > > Interfaces Description.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Also, I agree with Luke that for now setting a
> > >> default to
> > >> > > -1
> > >> > > > > > should
> > >> > > > > > > > be
> > >> > > > > > > > > > > good. We can look to switch to 1 day with major
> > >> release.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Thanks!
> > >> > > > > > > > > > > Sagar.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> > >> > > showuon@gmail.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > Hi Justine,
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Thanks for the KIP.
> > >> > > > > > > > > > > > I agree with you that we should try our best to
> > keep
> > >> > > > backward
> > >> > > > > > > > > > > > compatibility, although our intention is to have
> > >> lower
> > >> > > > > producer
> > >> > > > > > > id
> > >> > > > > > > > > > > > expiration timeout.
> > >> > > > > > > > > > > > So, I think we should keep default to -1 IMO.
> > >> > > > > > > > > > > > Maybe we change the default to 1 day in next
> major
> > >> > > release
> > >> > > > > > (4.0)?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Thank you.
> > >> > > > > > > > > > > > Luke
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > >> > > > > > > > > > > > <jo...@confluent.io.invalid>
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > > Thanks for taking a look Jason!
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > I wondered if we wanted to have a smaller
> > default
> > >> but
> > >> > > > > wasn't
> > >> > > > > > > sure
> > >> > > > > > > > > > about
> > >> > > > > > > > > > > > the
> > >> > > > > > > > > > > > > compatibility story -- especially since there
> is
> > >> the
> > >> > > > chance
> > >> > > > > > for
> > >> > > > > > > > > > > producer
> > >> > > > > > > > > > > > > IDs to expire silently.
> > >> > > > > > > > > > > > > I do think that 1 day is fairly reasonable.
> If I
> > >> > don't
> > >> > > > hear
> > >> > > > > > any
> > >> > > > > > > > > > > > conflicting
> > >> > > > > > > > > > > > > opinions I can go ahead and update the
> default.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > Justine
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason
> Gustafson
> > >> > > > > > > > > > > > > <ja...@confluent.io.invalid>
> > >> > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Hi Justine,
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing
> new
> > >> > > > > > > configurations,
> > >> > > > > > > > I
> > >> > > > > > > > > > > think
> > >> > > > > > > > > > > > > this
> > >> > > > > > > > > > > > > > is a good change. Combining these timeout
> > >> behaviors
> > >> > > > into
> > >> > > > > a
> > >> > > > > > > > single
> > >> > > > > > > > > > > > > > configuration was definitely a mistake, but
> we
> > >> > didn't
> > >> > > > > > > > anticipate
> > >> > > > > > > > > > the
> > >> > > > > > > > > > > > > > problem with the producer id cache. I do
> > wonder
> > >> if
> > >> > we
> > >> > > > can
> > >> > > > > > > make
> > >> > > > > > > > > the
> > >> > > > > > > > > > > > > default
> > >> > > > > > > > > > > > > > a bit lower to reduce the chances that users
> > >> will
> > >> > hit
> > >> > > > the
> > >> > > > > > > same
> > >> > > > > > > > > > memory
> > >> > > > > > > > > > > > > > issues we have seen. After decoupling this
> > >> > > > configuration
> > >> > > > > > from
> > >> > > > > > > > > > > > > > transactional.id.expiration.ms, the new
> > timeout
> > >> > just
> > >> > > > > needs
> > >> > > > > > > to
> > >> > > > > > > > > > cover
> > >> > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > longest duration that a producer might be
> > >> retrying
> > >> > > the
> > >> > > > > same
> > >> > > > > > > > > Produce
> > >> > > > > > > > > > > > > > request. 7 days seems too high. Although I
> > >> think it
> > >> > > > could
> > >> > > > > > go
> > >> > > > > > > a
> > >> > > > > > > > > fair
> > >> > > > > > > > > > > > even
> > >> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place
> to
> > >> > start?
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Thanks,
> > >> > > > > > > > > > > > > > Jason
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine
> Olshan
> > >> > > > > > > > > > > > > > <jo...@confluent.io.invalid>
> > >> > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Hey Bill,
> > >> > > > > > > > > > > > > > > Thanks! I was just going to say that
> > hopefully
> > >> > > > > > > > > > > > > > > transactional.id.expiration.ms would also
> > be
> > >> > over
> > >> > > > the
> > >> > > > > > > > delivery
> > >> > > > > > > > > > > > > timeout.
> > >> > > > > > > > > > > > > > :)
> > >> > > > > > > > > > > > > > > Thanks for the +1!
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Justine
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill
> Bejeck
> > <
> > >> > > > > > > > bbejeck@gmail.com
> > >> > > > > > > > > >
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Hi Justine,
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > I just took another look at the KIP,
> and I
> > >> > > realize
> > >> > > > my
> > >> > > > > > > > > > > > > > question/suggestion
> > >> > > > > > > > > > > > > > > > about default values has already been
> > >> addressed
> > >> > > in
> > >> > > > > the
> > >> > > > > > > > > > > > > `Compatibility`
> > >> > > > > > > > > > > > > > > > section.
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > I'm +1 on the KIP.
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > -Bill
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill
> > Bejeck
> > >> <
> > >> > > > > > > > > bbejeck@gmail.com
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Hi Justine,
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Thanks for the well written KIP, this
> > >> looks
> > >> > > like
> > >> > > > it
> > >> > > > > > > will
> > >> > > > > > > > > be a
> > >> > > > > > > > > > > > > useful
> > >> > > > > > > > > > > > > > > > > addition.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Overall the KIP looks good to me, I
> have
> > >> one
> > >> > > > > > > > > > question/comment.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > You mentioned that setting the `
> > >> > > > > > > > producer.id.expiration.ms`
> > >> > > > > > > > > > > less
> > >> > > > > > > > > > > > > than
> > >> > > > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > > delivery timeout could lead to
> > duplicates,
> > >> > > which
> > >> > > > > > makes
> > >> > > > > > > > > sense.
> > >> > > > > > > > > > > To
> > >> > > > > > > > > > > > > > help
> > >> > > > > > > > > > > > > > > > > avoid this situation, do we want to
> > >> consider
> > >> > a
> > >> > > > > > default
> > >> > > > > > > > > value
> > >> > > > > > > > > > > that
> > >> > > > > > > > > > > > > is
> > >> > > > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > > same as the delivery timeout?
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Thanks again for the KIP.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Bill
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM
> Justine
> > >> > Olshan
> > >> > > > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > >> Hey all!
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >> I'd like to start a discussion on my
> > >> > proposal
> > >> > > to
> > >> > > > > > > > separate
> > >> > > > > > > > > > > > > time-based
> > >> > > > > > > > > > > > > > > > >> producer ID expiration from
> > >> transactional ID
> > >> > > > > > > expiration
> > >> > > > > > > > by
> > >> > > > > > > > > > > > > > > introducing a
> > >> > > > > > > > > > > > > > > > >> new configuration.
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >> The KIP Is pretty small and simple,
> but
> > >> will
> > >> > > be
> > >> > > > > > > helpful
> > >> > > > > > > > in
> > >> > > > > > > > > > > > > > controlling
> > >> > > > > > > > > > > > > > > > >> memory usage in brokers -- especially
> > now
> > >> > that
> > >> > > > by
> > >> > > > > > > > default
> > >> > > > > > > > > > > > > producers
> > >> > > > > > > > > > > > > > > are
> > >> > > > > > > > > > > > > > > > >> idempotent and create producer ID
> > state.
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >> Please take a look and leave any
> > comments
> > >> > you
> > >> > > > may
> > >> > > > > > > have!
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >> KIP:
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > >> > > > > > > > > > > > > > > > >> JIRA:
> > >> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >> Thanks!
> > >> > > > > > > > > > > > > > > > >> Justine
> > >> > > > > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Ismael Juma <is...@juma.me.uk>.
Why does this have to be an external config? We can provide an internal
mechanism to configure this, no?

Ismael

On Wed, Aug 17, 2022 at 9:22 AM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hey all,
> Quick update to this KIP. While working on the PR and tests I realized that
> we have a hardcoded value for how often we clean up producer IDs. Currently
> this value is 10 minutes and is defined in LogManager.
> I thought for better testing and ease of use of the new configuration, we
> should also be able to configure the cleanup interval.
>
> Here is the new configuration I'm hoping to add. I also added it to the
> KIP.
> name: producer.id.expiration.check.interval.ms
> description: The interval at which to remove producer IDs that have expired
> due to producer.id.expiration.ms passing
> default: 600000 (10 minutes)
> valid values: [1,...]
> priority: low
> update-mode: read-only
>
> I left the default as the current hardcoded value to avoid disruptions. If
> there are any issues with this change let me know.
> Thanks,
> Justine
>
> On Fri, Aug 5, 2022 at 1:40 PM Justine Olshan <jo...@confluent.io>
> wrote:
>
> > Awesome. Thanks Tom!
> >
> > I plan to open this KIP vote at the start of next week.
> > Thanks all for the discussion! Let me know if there is anything else. :)
> >
> > Justine
> >
> > On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> >> Hi Justine,
> >>
> >> That all seems reasonable to me, thanks!
> >>
> >> On Wed, 3 Aug 2022 at 19:14, Justine Olshan
> <jolshan@confluent.io.invalid
> >> >
> >> wrote:
> >>
> >> > Hi Tom and Ismael,
> >> >
> >> > 1. Yes, there are definitely many ways to improve this issue and I
> plan
> >> to
> >> > write followup KIPs to address some of the larger changes.
> >> > Just wanted to get this simple fix in as a short term measure to
> prevent
> >> > issues with too many producer IDs in the cache. Stay tuned :)
> >> >
> >> > 2. I did have some offline discussion about informing the client. I
> >> think
> >> > for this specific KIP the default behavior in practice should not
> change
> >> > enough to require this information to go back to the client. In other
> >> > words, a reasonable configuration should not regress behavior.
> However,
> >> > with the further changes I mention in 1, perhaps this is something we
> >> want
> >> > to do. And yes -- unfortunately the current state of Kafka is no
> longer
> >> > totally consistent with KIP-98. This is something we probably want to
> >> > clarify in the future.
> >> >
> >> > 3. I will update the config to mention it is not dynamic. I think
> since
> >> the
> >> > transactional id configuration is read-only, this should be too.
> >> >
> >> > 4. I can update this wording.
> >> >
> >> > 5. I think there are definitely benefits to the name `
> >> > idempotent.pid.expiration.ms` but there are other ways this could
> cause
> >> > confusion. And to be clear -- the configuration can expire a producer
> ID
> >> > for a transactional producer as long as there isn't an ongoing
> >> transaction.
> >> >
> >> > Let me know if you have any questions and thanks for taking a look!
> >> >
> >> > Justine
> >> >
> >> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk> wrote:
> >> >
> >> > > Regarding 1, more can certainly be done, but I think it would be
> >> > > complementary. As such, I think this KIP stands on its own and
> >> additional
> >> > > improvements can be handled via future KIPs (unless Justine wants to
> >> > > combine things, of course).
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com>
> >> wrote:
> >> > >
> >> > > > Hi Justine,
> >> > > >
> >> > > > Thanks for the KIP! I can see that this is a pragmatic attempt to
> >> > > address a
> >> > > > nasty problem. I have a few questions:
> >> > > >
> >> > > > 1. The KIP makes the problem significantly harder to trigger, but
> >> > doesn't
> >> > > > eliminate it entirely. How confident are you that it will be
> >> sufficient
> >> > > in
> >> > > > practice? We can point to applications which are creating
> idempotent
> >> > > > producers at a high rate and say they're broken, but that doesn't
> do
> >> > > > anything to defend the broker from an interaction pattern that
> >> differs
> >> > > only
> >> > > > in rate from a "good application". Did you consider a new quota to
> >> > limit
> >> > > > the rate at which a (principal, clientId) can allocate new PIDs?
> >> > > >
> >> > > > 2. The KIP contains this sentence: "when an idempotent producer’s
> ID
> >> > > > expires, it silently loses its idempotency guarantees." That's at
> >> odds
> >> > > with
> >> > > > my reading of "PID expiration" in the KIP-98 design[1], but it
> does
> >> > seem
> >> > > > consistent with a (brief!) look at the code. I accept that the
> risk
> >> > > should
> >> > > > be minimal so long as the expiration time is > the producer's
> >> delivery
> >> > > > timeout, but it would still be nice if we could detect this
> >> situation
> >> > and
> >> > > > return an error to the client. Is there a reason for the apparent
> >> > > deviation
> >> > > > from KIP-98 (or am I misreading the code?)
> >> > > >
> >> > > > 3. Could the KIP be explicit on whether the new config will be
> >> > > dynamically
> >> > > > changeable?
> >> > > >
> >> > > > 4. The description of producer.id.expiration.ms mentions the
> >> > > > ProducerStateManager, which will mean nothing to a normal user. We
> >> > could
> >> > > > probably change it to "a topic partition leader" without loss of
> >> > meaning.
> >> > > >
> >> > > > 5. The description also says "Producer IDs will not expire while a
> >> > > > transaction associated to them is still ongoing." To me this
> >> suggests
> >> > > that
> >> > > > a more intuitive name for this config (from the user PoV) would
> >> include
> >> > > > "idempotent", since it does not cover the transactional case. (I
> >> would
> >> > > > suggest "idempotent.pid.expiration.ms" (c.f.
> >> > > > transactional.id.expiration.ms),
> >> > > > but the distinction between "id" and "pid" is easily missed–even
> if
> >> > it's
> >> > > > technically correct–so I'm not sure it's better than what you're
> >> > > > proposing). I only mention it in case it prompts someone else to
> >> find a
> >> > > > better name.
> >> > > >
> >> > > > Thanks again,
> >> > > >
> >> > > > Tom
> >> > > >
> >> > > > [1]:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> >> > > >
> >> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> >> > <jolshan@confluent.io.invalid
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > I've updated the KIP to make the new minimum value 1 and remove
> >> the
> >> > -1
> >> > > > > configuration.
> >> > > > > I've also added the low priority to the configuration and edited
> >> the
> >> > > > > description as Ismael mentioned.
> >> > > > >
> >> > > > > I'm thinking about bringing this KIP to a vote soon! Let me know
> >> if
> >> > > there
> >> > > > > are any other comments or questions.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Justine
> >> > > > >
> >> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> >> > > > <jason@confluent.io.invalid
> >> > > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I agree with Ismael that we should remove -1. I think we tend
> to
> >> > view
> >> > > > the
> >> > > > > > coupling of these behaviors into a single configuration as a
> >> > mistake,
> >> > > > so
> >> > > > > > it's a little odd to keep it (even if in a weakened form).
> >> > > > > >
> >> > > > > > -Jason
> >> > > > > >
> >> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <
> ismael@juma.me.uk>
> >> > > wrote:
> >> > > > > >
> >> > > > > > > I would remove -1 altogether. Two more comments:
> >> > > > > > >
> >> > > > > > > 1. The current description kind of leads people towards
> >> aligning
> >> > > the
> >> > > > > > config
> >> > > > > > > with delivery.timeout.ms. Is that what we want? We could
> say
> >> it
> >> > > > should
> >> > > > > > be
> >> > > > > > > higher than delivery.timeout.ms but indicate that the
> >> default is
> >> > > > > usually
> >> > > > > > > fine. The main reason to reduce it would be to save memory,
> I
> >> > > guess.
> >> > > > > > > 2. Each config has a priority, we should specify it for this
> >> one.
> >> > > I'm
> >> > > > > > > assuming it will be "low".
> >> > > > > > >
> >> > > > > > > Ismael
> >> > > > > > >
> >> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> >> > > > > > > <jo...@confluent.io.invalid>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi all,
> >> > > > > > > >
> >> > > > > > > > I've updated the KIP to include the new default of 1 day
> and
> >> > > > > > information
> >> > > > > > > > about -1 in the description of the config.
> >> > > > > > > > I wonder though if including -1 makes sense now that it is
> >> not
> >> > > the
> >> > > > > > > default
> >> > > > > > > > value. Is there a benefit for manually setting -1 vs
> >> manually
> >> > > > setting
> >> > > > > > the
> >> > > > > > > > value that transactional.id.expiration.ms has?
> >> > > > > > > >
> >> > > > > > > > Let me know your thoughts.
> >> > > > > > > > Thanks,
> >> > > > > > > > Justine
> >> > > > > > > >
> >> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> >> > ismael@juma.me.uk>
> >> > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > +1 for having 1 day as the default and for including
> this
> >> > > change
> >> > > > in
> >> > > > > > the
> >> > > > > > > > > release notes.
> >> > > > > > > > >
> >> > > > > > > > > Ismael
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> >> > > > > > > > <jason@confluent.io.invalid
> >> > > > > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > I don't think a major release is a requirement for a
> >> > default
> >> > > > > change
> >> > > > > > > for
> >> > > > > > > > > > what it's worth. I do appreciate that there is a
> >> preference
> >> > > for
> >> > > > > not
> >> > > > > > > > > rocking
> >> > > > > > > > > > the boat though. For a little bit of background here,
> >> the
> >> > > > problem
> >> > > > > > we
> >> > > > > > > > > > have encountered in production since the idempotent
> >> > producer
> >> > > > > became
> >> > > > > > > the
> >> > > > > > > > > > default is OOM errors due to huge numbers of
> producerIds
> >> > that
> >> > > > had
> >> > > > > > to
> >> > > > > > > be
> >> > > > > > > > > > retained in the cache for 7 days. It is hard to
> prevent
> >> use
> >> > > > cases
> >> > > > > > > from
> >> > > > > > > > > > emerging where producers are used and discarded
> >> rapidly. We
> >> > > > will
> >> > > > > be
> >> > > > > > > > > using a
> >> > > > > > > > > > lower value for sure, but it would also be nice to
> >> reduce
> >> > the
> >> > > > > > > > likelihood
> >> > > > > > > > > > for the community to see this problem. The benefit of
> >> the
> >> > > > caching
> >> > > > > > > > > > diminishes quickly over time since it is primarily
> >> meant to
> >> > > > > handle
> >> > > > > > > > > producer
> >> > > > > > > > > > retry windows. I do not think there is much difference
> >> > > between
> >> > > > 1
> >> > > > > > days
> >> > > > > > > > > and 7
> >> > > > > > > > > > days from an application perspective, but it is a huge
> >> > > > difference
> >> > > > > > for
> >> > > > > > > > the
> >> > > > > > > > > > broker's memory usage.
> >> > > > > > > > > >
> >> > > > > > > > > > Best,
> >> > > > > > > > > > Jason
> >> > > > > > > > > >
> >> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> >> > > > sagarmeansocean@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Thanks Justine for the KIP. I think it might be
> >> better to
> >> > > > > > document
> >> > > > > > > > the
> >> > > > > > > > > > > correlation between the new config and
> >> > delivery.timeout.ms
> >> > > > in
> >> > > > > > the
> >> > > > > > > > > Public
> >> > > > > > > > > > > Interfaces Description.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Also, I agree with Luke that for now setting a
> >> default to
> >> > > -1
> >> > > > > > should
> >> > > > > > > > be
> >> > > > > > > > > > > good. We can look to switch to 1 day with major
> >> release.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thanks!
> >> > > > > > > > > > > Sagar.
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> >> > > showuon@gmail.com
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Justine,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks for the KIP.
> >> > > > > > > > > > > > I agree with you that we should try our best to
> keep
> >> > > > backward
> >> > > > > > > > > > > > compatibility, although our intention is to have
> >> lower
> >> > > > > producer
> >> > > > > > > id
> >> > > > > > > > > > > > expiration timeout.
> >> > > > > > > > > > > > So, I think we should keep default to -1 IMO.
> >> > > > > > > > > > > > Maybe we change the default to 1 day in next major
> >> > > release
> >> > > > > > (4.0)?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thank you.
> >> > > > > > > > > > > > Luke
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> >> > > > > > > > > > > > <jo...@confluent.io.invalid>
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Thanks for taking a look Jason!
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > I wondered if we wanted to have a smaller
> default
> >> but
> >> > > > > wasn't
> >> > > > > > > sure
> >> > > > > > > > > > about
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > compatibility story -- especially since there is
> >> the
> >> > > > chance
> >> > > > > > for
> >> > > > > > > > > > > producer
> >> > > > > > > > > > > > > IDs to expire silently.
> >> > > > > > > > > > > > > I do think that 1 day is fairly reasonable. If I
> >> > don't
> >> > > > hear
> >> > > > > > any
> >> > > > > > > > > > > > conflicting
> >> > > > > > > > > > > > > opinions I can go ahead and update the default.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Justine
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> >> > > > > > > > > > > > > <ja...@confluent.io.invalid>
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi Justine,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> >> > > > > > > configurations,
> >> > > > > > > > I
> >> > > > > > > > > > > think
> >> > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > is a good change. Combining these timeout
> >> behaviors
> >> > > > into
> >> > > > > a
> >> > > > > > > > single
> >> > > > > > > > > > > > > > configuration was definitely a mistake, but we
> >> > didn't
> >> > > > > > > > anticipate
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > > problem with the producer id cache. I do
> wonder
> >> if
> >> > we
> >> > > > can
> >> > > > > > > make
> >> > > > > > > > > the
> >> > > > > > > > > > > > > default
> >> > > > > > > > > > > > > > a bit lower to reduce the chances that users
> >> will
> >> > hit
> >> > > > the
> >> > > > > > > same
> >> > > > > > > > > > memory
> >> > > > > > > > > > > > > > issues we have seen. After decoupling this
> >> > > > configuration
> >> > > > > > from
> >> > > > > > > > > > > > > > transactional.id.expiration.ms, the new
> timeout
> >> > just
> >> > > > > needs
> >> > > > > > > to
> >> > > > > > > > > > cover
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > longest duration that a producer might be
> >> retrying
> >> > > the
> >> > > > > same
> >> > > > > > > > > Produce
> >> > > > > > > > > > > > > > request. 7 days seems too high. Although I
> >> think it
> >> > > > could
> >> > > > > > go
> >> > > > > > > a
> >> > > > > > > > > fair
> >> > > > > > > > > > > > even
> >> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place to
> >> > start?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > Jason
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> >> > > > > > > > > > > > > > <jo...@confluent.io.invalid>
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Hey Bill,
> >> > > > > > > > > > > > > > > Thanks! I was just going to say that
> hopefully
> >> > > > > > > > > > > > > > > transactional.id.expiration.ms would also
> be
> >> > over
> >> > > > the
> >> > > > > > > > delivery
> >> > > > > > > > > > > > > timeout.
> >> > > > > > > > > > > > > > :)
> >> > > > > > > > > > > > > > > Thanks for the +1!
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Justine
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck
> <
> >> > > > > > > > bbejeck@gmail.com
> >> > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi Justine,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I just took another look at the KIP, and I
> >> > > realize
> >> > > > my
> >> > > > > > > > > > > > > > question/suggestion
> >> > > > > > > > > > > > > > > > about default values has already been
> >> addressed
> >> > > in
> >> > > > > the
> >> > > > > > > > > > > > > `Compatibility`
> >> > > > > > > > > > > > > > > > section.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I'm +1 on the KIP.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > -Bill
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill
> Bejeck
> >> <
> >> > > > > > > > > bbejeck@gmail.com
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Hi Justine,
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Thanks for the well written KIP, this
> >> looks
> >> > > like
> >> > > > it
> >> > > > > > > will
> >> > > > > > > > > be a
> >> > > > > > > > > > > > > useful
> >> > > > > > > > > > > > > > > > > addition.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Overall the KIP looks good to me, I have
> >> one
> >> > > > > > > > > > question/comment.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > You mentioned that setting the `
> >> > > > > > > > producer.id.expiration.ms`
> >> > > > > > > > > > > less
> >> > > > > > > > > > > > > than
> >> > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > delivery timeout could lead to
> duplicates,
> >> > > which
> >> > > > > > makes
> >> > > > > > > > > sense.
> >> > > > > > > > > > > To
> >> > > > > > > > > > > > > > help
> >> > > > > > > > > > > > > > > > > avoid this situation, do we want to
> >> consider
> >> > a
> >> > > > > > default
> >> > > > > > > > > value
> >> > > > > > > > > > > that
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > same as the delivery timeout?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Thanks again for the KIP.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Bill
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine
> >> > Olshan
> >> > > > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >> Hey all!
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> I'd like to start a discussion on my
> >> > proposal
> >> > > to
> >> > > > > > > > separate
> >> > > > > > > > > > > > > time-based
> >> > > > > > > > > > > > > > > > >> producer ID expiration from
> >> transactional ID
> >> > > > > > > expiration
> >> > > > > > > > by
> >> > > > > > > > > > > > > > > introducing a
> >> > > > > > > > > > > > > > > > >> new configuration.
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> The KIP Is pretty small and simple, but
> >> will
> >> > > be
> >> > > > > > > helpful
> >> > > > > > > > in
> >> > > > > > > > > > > > > > controlling
> >> > > > > > > > > > > > > > > > >> memory usage in brokers -- especially
> now
> >> > that
> >> > > > by
> >> > > > > > > > default
> >> > > > > > > > > > > > > producers
> >> > > > > > > > > > > > > > > are
> >> > > > > > > > > > > > > > > > >> idempotent and create producer ID
> state.
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> Please take a look and leave any
> comments
> >> > you
> >> > > > may
> >> > > > > > > have!
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> KIP:
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> >> > > > > > > > > > > > > > > > >> JIRA:
> >> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> Thanks!
> >> > > > > > > > > > > > > > > > >> Justine
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hey all,
Quick update to this KIP. While working on the PR and tests I realized that
we have a hardcoded value for how often we clean up producer IDs. Currently
this value is 10 minutes and is defined in LogManager.
I thought for better testing and ease of use of the new configuration, we
should also be able to configure the cleanup interval.

Here is the new configuration I'm hoping to add. I also added it to the KIP.
name: producer.id.expiration.check.interval.ms
description: The interval at which to remove producer IDs that have expired
due to producer.id.expiration.ms passing
default: 600000 (10 minutes)
valid values: [1,...]
priority: low
update-mode: read-only

I left the default as the current hardcoded value to avoid disruptions. If
there are any issues with this change let me know.
Thanks,
Justine

On Fri, Aug 5, 2022 at 1:40 PM Justine Olshan <jo...@confluent.io> wrote:

> Awesome. Thanks Tom!
>
> I plan to open this KIP vote at the start of next week.
> Thanks all for the discussion! Let me know if there is anything else. :)
>
> Justine
>
> On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tb...@redhat.com> wrote:
>
>> Hi Justine,
>>
>> That all seems reasonable to me, thanks!
>>
>> On Wed, 3 Aug 2022 at 19:14, Justine Olshan <jolshan@confluent.io.invalid
>> >
>> wrote:
>>
>> > Hi Tom and Ismael,
>> >
>> > 1. Yes, there are definitely many ways to improve this issue and I plan
>> to
>> > write followup KIPs to address some of the larger changes.
>> > Just wanted to get this simple fix in as a short term measure to prevent
>> > issues with too many producer IDs in the cache. Stay tuned :)
>> >
>> > 2. I did have some offline discussion about informing the client. I
>> think
>> > for this specific KIP the default behavior in practice should not change
>> > enough to require this information to go back to the client. In other
>> > words, a reasonable configuration should not regress behavior. However,
>> > with the further changes I mention in 1, perhaps this is something we
>> want
>> > to do. And yes -- unfortunately the current state of Kafka is no longer
>> > totally consistent with KIP-98. This is something we probably want to
>> > clarify in the future.
>> >
>> > 3. I will update the config to mention it is not dynamic. I think since
>> the
>> > transactional id configuration is read-only, this should be too.
>> >
>> > 4. I can update this wording.
>> >
>> > 5. I think there are definitely benefits to the name `
>> > idempotent.pid.expiration.ms` but there are other ways this could cause
>> > confusion. And to be clear -- the configuration can expire a producer ID
>> > for a transactional producer as long as there isn't an ongoing
>> transaction.
>> >
>> > Let me know if you have any questions and thanks for taking a look!
>> >
>> > Justine
>> >
>> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> > > Regarding 1, more can certainly be done, but I think it would be
>> > > complementary. As such, I think this KIP stands on its own and
>> additional
>> > > improvements can be handled via future KIPs (unless Justine wants to
>> > > combine things, of course).
>> > >
>> > > Ismael
>> > >
>> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com>
>> wrote:
>> > >
>> > > > Hi Justine,
>> > > >
>> > > > Thanks for the KIP! I can see that this is a pragmatic attempt to
>> > > address a
>> > > > nasty problem. I have a few questions:
>> > > >
>> > > > 1. The KIP makes the problem significantly harder to trigger, but
>> > doesn't
>> > > > eliminate it entirely. How confident are you that it will be
>> sufficient
>> > > in
>> > > > practice? We can point to applications which are creating idempotent
>> > > > producers at a high rate and say they're broken, but that doesn't do
>> > > > anything to defend the broker from an interaction pattern that
>> differs
>> > > only
>> > > > in rate from a "good application". Did you consider a new quota to
>> > limit
>> > > > the rate at which a (principal, clientId) can allocate new PIDs?
>> > > >
>> > > > 2. The KIP contains this sentence: "when an idempotent producer’s ID
>> > > > expires, it silently loses its idempotency guarantees." That's at
>> odds
>> > > with
>> > > > my reading of "PID expiration" in the KIP-98 design[1], but it does
>> > seem
>> > > > consistent with a (brief!) look at the code. I accept that the risk
>> > > should
>> > > > be minimal so long as the expiration time is > the producer's
>> delivery
>> > > > timeout, but it would still be nice if we could detect this
>> situation
>> > and
>> > > > return an error to the client. Is there a reason for the apparent
>> > > deviation
>> > > > from KIP-98 (or am I misreading the code?)
>> > > >
>> > > > 3. Could the KIP be explicit on whether the new config will be
>> > > dynamically
>> > > > changeable?
>> > > >
>> > > > 4. The description of producer.id.expiration.ms mentions the
>> > > > ProducerStateManager, which will mean nothing to a normal user. We
>> > could
>> > > > probably change it to "a topic partition leader" without loss of
>> > meaning.
>> > > >
>> > > > 5. The description also says "Producer IDs will not expire while a
>> > > > transaction associated to them is still ongoing." To me this
>> suggests
>> > > that
>> > > > a more intuitive name for this config (from the user PoV) would
>> include
>> > > > "idempotent", since it does not cover the transactional case. (I
>> would
>> > > > suggest "idempotent.pid.expiration.ms" (c.f.
>> > > > transactional.id.expiration.ms),
>> > > > but the distinction between "id" and "pid" is easily missed–even if
>> > it's
>> > > > technically correct–so I'm not sure it's better than what you're
>> > > > proposing). I only mention it in case it prompts someone else to
>> find a
>> > > > better name.
>> > > >
>> > > > Thanks again,
>> > > >
>> > > > Tom
>> > > >
>> > > > [1]:
>> > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
>> > > >
>> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
>> > <jolshan@confluent.io.invalid
>> > > >
>> > > > wrote:
>> > > >
>> > > > > I've updated the KIP to make the new minimum value 1 and remove
>> the
>> > -1
>> > > > > configuration.
>> > > > > I've also added the low priority to the configuration and edited
>> the
>> > > > > description as Ismael mentioned.
>> > > > >
>> > > > > I'm thinking about bringing this KIP to a vote soon! Let me know
>> if
>> > > there
>> > > > > are any other comments or questions.
>> > > > >
>> > > > > Thanks,
>> > > > > Justine
>> > > > >
>> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
>> > > > <jason@confluent.io.invalid
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > I agree with Ismael that we should remove -1. I think we tend to
>> > view
>> > > > the
>> > > > > > coupling of these behaviors into a single configuration as a
>> > mistake,
>> > > > so
>> > > > > > it's a little odd to keep it (even if in a weakened form).
>> > > > > >
>> > > > > > -Jason
>> > > > > >
>> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk>
>> > > wrote:
>> > > > > >
>> > > > > > > I would remove -1 altogether. Two more comments:
>> > > > > > >
>> > > > > > > 1. The current description kind of leads people towards
>> aligning
>> > > the
>> > > > > > config
>> > > > > > > with delivery.timeout.ms. Is that what we want? We could say
>> it
>> > > > should
>> > > > > > be
>> > > > > > > higher than delivery.timeout.ms but indicate that the
>> default is
>> > > > > usually
>> > > > > > > fine. The main reason to reduce it would be to save memory, I
>> > > guess.
>> > > > > > > 2. Each config has a priority, we should specify it for this
>> one.
>> > > I'm
>> > > > > > > assuming it will be "low".
>> > > > > > >
>> > > > > > > Ismael
>> > > > > > >
>> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
>> > > > > > > <jo...@confluent.io.invalid>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi all,
>> > > > > > > >
>> > > > > > > > I've updated the KIP to include the new default of 1 day and
>> > > > > > information
>> > > > > > > > about -1 in the description of the config.
>> > > > > > > > I wonder though if including -1 makes sense now that it is
>> not
>> > > the
>> > > > > > > default
>> > > > > > > > value. Is there a benefit for manually setting -1 vs
>> manually
>> > > > setting
>> > > > > > the
>> > > > > > > > value that transactional.id.expiration.ms has?
>> > > > > > > >
>> > > > > > > > Let me know your thoughts.
>> > > > > > > > Thanks,
>> > > > > > > > Justine
>> > > > > > > >
>> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
>> > ismael@juma.me.uk>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > +1 for having 1 day as the default and for including this
>> > > change
>> > > > in
>> > > > > > the
>> > > > > > > > > release notes.
>> > > > > > > > >
>> > > > > > > > > Ismael
>> > > > > > > > >
>> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
>> > > > > > > > <jason@confluent.io.invalid
>> > > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > I don't think a major release is a requirement for a
>> > default
>> > > > > change
>> > > > > > > for
>> > > > > > > > > > what it's worth. I do appreciate that there is a
>> preference
>> > > for
>> > > > > not
>> > > > > > > > > rocking
>> > > > > > > > > > the boat though. For a little bit of background here,
>> the
>> > > > problem
>> > > > > > we
>> > > > > > > > > > have encountered in production since the idempotent
>> > producer
>> > > > > became
>> > > > > > > the
>> > > > > > > > > > default is OOM errors due to huge numbers of producerIds
>> > that
>> > > > had
>> > > > > > to
>> > > > > > > be
>> > > > > > > > > > retained in the cache for 7 days. It is hard to prevent
>> use
>> > > > cases
>> > > > > > > from
>> > > > > > > > > > emerging where producers are used and discarded
>> rapidly. We
>> > > > will
>> > > > > be
>> > > > > > > > > using a
>> > > > > > > > > > lower value for sure, but it would also be nice to
>> reduce
>> > the
>> > > > > > > > likelihood
>> > > > > > > > > > for the community to see this problem. The benefit of
>> the
>> > > > caching
>> > > > > > > > > > diminishes quickly over time since it is primarily
>> meant to
>> > > > > handle
>> > > > > > > > > producer
>> > > > > > > > > > retry windows. I do not think there is much difference
>> > > between
>> > > > 1
>> > > > > > days
>> > > > > > > > > and 7
>> > > > > > > > > > days from an application perspective, but it is a huge
>> > > > difference
>> > > > > > for
>> > > > > > > > the
>> > > > > > > > > > broker's memory usage.
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > > Jason
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
>> > > > sagarmeansocean@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Thanks Justine for the KIP. I think it might be
>> better to
>> > > > > > document
>> > > > > > > > the
>> > > > > > > > > > > correlation between the new config and
>> > delivery.timeout.ms
>> > > > in
>> > > > > > the
>> > > > > > > > > Public
>> > > > > > > > > > > Interfaces Description.
>> > > > > > > > > > >
>> > > > > > > > > > > Also, I agree with Luke that for now setting a
>> default to
>> > > -1
>> > > > > > should
>> > > > > > > > be
>> > > > > > > > > > > good. We can look to switch to 1 day with major
>> release.
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks!
>> > > > > > > > > > > Sagar.
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
>> > > showuon@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Justine,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks for the KIP.
>> > > > > > > > > > > > I agree with you that we should try our best to keep
>> > > > backward
>> > > > > > > > > > > > compatibility, although our intention is to have
>> lower
>> > > > > producer
>> > > > > > > id
>> > > > > > > > > > > > expiration timeout.
>> > > > > > > > > > > > So, I think we should keep default to -1 IMO.
>> > > > > > > > > > > > Maybe we change the default to 1 day in next major
>> > > release
>> > > > > > (4.0)?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thank you.
>> > > > > > > > > > > > Luke
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
>> > > > > > > > > > > > <jo...@confluent.io.invalid>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks for taking a look Jason!
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I wondered if we wanted to have a smaller default
>> but
>> > > > > wasn't
>> > > > > > > sure
>> > > > > > > > > > about
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > compatibility story -- especially since there is
>> the
>> > > > chance
>> > > > > > for
>> > > > > > > > > > > producer
>> > > > > > > > > > > > > IDs to expire silently.
>> > > > > > > > > > > > > I do think that 1 day is fairly reasonable. If I
>> > don't
>> > > > hear
>> > > > > > any
>> > > > > > > > > > > > conflicting
>> > > > > > > > > > > > > opinions I can go ahead and update the default.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Justine
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
>> > > > > > > > > > > > > <ja...@confluent.io.invalid>
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi Justine,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
>> > > > > > > configurations,
>> > > > > > > > I
>> > > > > > > > > > > think
>> > > > > > > > > > > > > this
>> > > > > > > > > > > > > > is a good change. Combining these timeout
>> behaviors
>> > > > into
>> > > > > a
>> > > > > > > > single
>> > > > > > > > > > > > > > configuration was definitely a mistake, but we
>> > didn't
>> > > > > > > > anticipate
>> > > > > > > > > > the
>> > > > > > > > > > > > > > problem with the producer id cache. I do wonder
>> if
>> > we
>> > > > can
>> > > > > > > make
>> > > > > > > > > the
>> > > > > > > > > > > > > default
>> > > > > > > > > > > > > > a bit lower to reduce the chances that users
>> will
>> > hit
>> > > > the
>> > > > > > > same
>> > > > > > > > > > memory
>> > > > > > > > > > > > > > issues we have seen. After decoupling this
>> > > > configuration
>> > > > > > from
>> > > > > > > > > > > > > > transactional.id.expiration.ms, the new timeout
>> > just
>> > > > > needs
>> > > > > > > to
>> > > > > > > > > > cover
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > longest duration that a producer might be
>> retrying
>> > > the
>> > > > > same
>> > > > > > > > > Produce
>> > > > > > > > > > > > > > request. 7 days seems too high. Although I
>> think it
>> > > > could
>> > > > > > go
>> > > > > > > a
>> > > > > > > > > fair
>> > > > > > > > > > > > even
>> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place to
>> > start?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > Jason
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
>> > > > > > > > > > > > > > <jo...@confluent.io.invalid>
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hey Bill,
>> > > > > > > > > > > > > > > Thanks! I was just going to say that hopefully
>> > > > > > > > > > > > > > > transactional.id.expiration.ms would also be
>> > over
>> > > > the
>> > > > > > > > delivery
>> > > > > > > > > > > > > timeout.
>> > > > > > > > > > > > > > :)
>> > > > > > > > > > > > > > > Thanks for the +1!
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Justine
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
>> > > > > > > > bbejeck@gmail.com
>> > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Justine,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I just took another look at the KIP, and I
>> > > realize
>> > > > my
>> > > > > > > > > > > > > > question/suggestion
>> > > > > > > > > > > > > > > > about default values has already been
>> addressed
>> > > in
>> > > > > the
>> > > > > > > > > > > > > `Compatibility`
>> > > > > > > > > > > > > > > > section.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I'm +1 on the KIP.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > -Bill
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck
>> <
>> > > > > > > > > bbejeck@gmail.com
>> > > > > > > > > > >
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Hi Justine,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thanks for the well written KIP, this
>> looks
>> > > like
>> > > > it
>> > > > > > > will
>> > > > > > > > > be a
>> > > > > > > > > > > > > useful
>> > > > > > > > > > > > > > > > > addition.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Overall the KIP looks good to me, I have
>> one
>> > > > > > > > > > question/comment.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > You mentioned that setting the `
>> > > > > > > > producer.id.expiration.ms`
>> > > > > > > > > > > less
>> > > > > > > > > > > > > than
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > delivery timeout could lead to duplicates,
>> > > which
>> > > > > > makes
>> > > > > > > > > sense.
>> > > > > > > > > > > To
>> > > > > > > > > > > > > > help
>> > > > > > > > > > > > > > > > > avoid this situation, do we want to
>> consider
>> > a
>> > > > > > default
>> > > > > > > > > value
>> > > > > > > > > > > that
>> > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > same as the delivery timeout?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thanks again for the KIP.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Bill
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine
>> > Olshan
>> > > > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >> Hey all!
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> I'd like to start a discussion on my
>> > proposal
>> > > to
>> > > > > > > > separate
>> > > > > > > > > > > > > time-based
>> > > > > > > > > > > > > > > > >> producer ID expiration from
>> transactional ID
>> > > > > > > expiration
>> > > > > > > > by
>> > > > > > > > > > > > > > > introducing a
>> > > > > > > > > > > > > > > > >> new configuration.
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> The KIP Is pretty small and simple, but
>> will
>> > > be
>> > > > > > > helpful
>> > > > > > > > in
>> > > > > > > > > > > > > > controlling
>> > > > > > > > > > > > > > > > >> memory usage in brokers -- especially now
>> > that
>> > > > by
>> > > > > > > > default
>> > > > > > > > > > > > > producers
>> > > > > > > > > > > > > > > are
>> > > > > > > > > > > > > > > > >> idempotent and create producer ID state.
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> Please take a look and leave any comments
>> > you
>> > > > may
>> > > > > > > have!
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> KIP:
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
>> > > > > > > > > > > > > > > > >> JIRA:
>> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> Thanks!
>> > > > > > > > > > > > > > > > >> Justine
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Awesome. Thanks Tom!

I plan to open this KIP vote at the start of next week.
Thanks all for the discussion! Let me know if there is anything else. :)

Justine

On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Justine,
>
> That all seems reasonable to me, thanks!
>
> On Wed, 3 Aug 2022 at 19:14, Justine Olshan <jo...@confluent.io.invalid>
> wrote:
>
> > Hi Tom and Ismael,
> >
> > 1. Yes, there are definitely many ways to improve this issue and I plan
> to
> > write followup KIPs to address some of the larger changes.
> > Just wanted to get this simple fix in as a short term measure to prevent
> > issues with too many producer IDs in the cache. Stay tuned :)
> >
> > 2. I did have some offline discussion about informing the client. I think
> > for this specific KIP the default behavior in practice should not change
> > enough to require this information to go back to the client. In other
> > words, a reasonable configuration should not regress behavior. However,
> > with the further changes I mention in 1, perhaps this is something we
> want
> > to do. And yes -- unfortunately the current state of Kafka is no longer
> > totally consistent with KIP-98. This is something we probably want to
> > clarify in the future.
> >
> > 3. I will update the config to mention it is not dynamic. I think since
> the
> > transactional id configuration is read-only, this should be too.
> >
> > 4. I can update this wording.
> >
> > 5. I think there are definitely benefits to the name `
> > idempotent.pid.expiration.ms` but there are other ways this could cause
> > confusion. And to be clear -- the configuration can expire a producer ID
> > for a transactional producer as long as there isn't an ongoing
> transaction.
> >
> > Let me know if you have any questions and thanks for taking a look!
> >
> > Justine
> >
> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Regarding 1, more can certainly be done, but I think it would be
> > > complementary. As such, I think this KIP stands on its own and
> additional
> > > improvements can be handled via future KIPs (unless Justine wants to
> > > combine things, of course).
> > >
> > > Ismael
> > >
> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com>
> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP! I can see that this is a pragmatic attempt to
> > > address a
> > > > nasty problem. I have a few questions:
> > > >
> > > > 1. The KIP makes the problem significantly harder to trigger, but
> > doesn't
> > > > eliminate it entirely. How confident are you that it will be
> sufficient
> > > in
> > > > practice? We can point to applications which are creating idempotent
> > > > producers at a high rate and say they're broken, but that doesn't do
> > > > anything to defend the broker from an interaction pattern that
> differs
> > > only
> > > > in rate from a "good application". Did you consider a new quota to
> > limit
> > > > the rate at which a (principal, clientId) can allocate new PIDs?
> > > >
> > > > 2. The KIP contains this sentence: "when an idempotent producer’s ID
> > > > expires, it silently loses its idempotency guarantees." That's at
> odds
> > > with
> > > > my reading of "PID expiration" in the KIP-98 design[1], but it does
> > seem
> > > > consistent with a (brief!) look at the code. I accept that the risk
> > > should
> > > > be minimal so long as the expiration time is > the producer's
> delivery
> > > > timeout, but it would still be nice if we could detect this situation
> > and
> > > > return an error to the client. Is there a reason for the apparent
> > > deviation
> > > > from KIP-98 (or am I misreading the code?)
> > > >
> > > > 3. Could the KIP be explicit on whether the new config will be
> > > dynamically
> > > > changeable?
> > > >
> > > > 4. The description of producer.id.expiration.ms mentions the
> > > > ProducerStateManager, which will mean nothing to a normal user. We
> > could
> > > > probably change it to "a topic partition leader" without loss of
> > meaning.
> > > >
> > > > 5. The description also says "Producer IDs will not expire while a
> > > > transaction associated to them is still ongoing." To me this suggests
> > > that
> > > > a more intuitive name for this config (from the user PoV) would
> include
> > > > "idempotent", since it does not cover the transactional case. (I
> would
> > > > suggest "idempotent.pid.expiration.ms" (c.f.
> > > > transactional.id.expiration.ms),
> > > > but the distinction between "id" and "pid" is easily missed–even if
> > it's
> > > > technically correct–so I'm not sure it's better than what you're
> > > > proposing). I only mention it in case it prompts someone else to
> find a
> > > > better name.
> > > >
> > > > Thanks again,
> > > >
> > > > Tom
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> > > >
> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> > <jolshan@confluent.io.invalid
> > > >
> > > > wrote:
> > > >
> > > > > I've updated the KIP to make the new minimum value 1 and remove the
> > -1
> > > > > configuration.
> > > > > I've also added the low priority to the configuration and edited
> the
> > > > > description as Ismael mentioned.
> > > > >
> > > > > I'm thinking about bringing this KIP to a vote soon! Let me know if
> > > there
> > > > > are any other comments or questions.
> > > > >
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > > > <jason@confluent.io.invalid
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I agree with Ismael that we should remove -1. I think we tend to
> > view
> > > > the
> > > > > > coupling of these behaviors into a single configuration as a
> > mistake,
> > > > so
> > > > > > it's a little odd to keep it (even if in a weakened form).
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > > >
> > > > > > > I would remove -1 altogether. Two more comments:
> > > > > > >
> > > > > > > 1. The current description kind of leads people towards
> aligning
> > > the
> > > > > > config
> > > > > > > with delivery.timeout.ms. Is that what we want? We could say
> it
> > > > should
> > > > > > be
> > > > > > > higher than delivery.timeout.ms but indicate that the default
> is
> > > > > usually
> > > > > > > fine. The main reason to reduce it would be to save memory, I
> > > guess.
> > > > > > > 2. Each config has a priority, we should specify it for this
> one.
> > > I'm
> > > > > > > assuming it will be "low".
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I've updated the KIP to include the new default of 1 day and
> > > > > > information
> > > > > > > > about -1 in the description of the config.
> > > > > > > > I wonder though if including -1 makes sense now that it is
> not
> > > the
> > > > > > > default
> > > > > > > > value. Is there a benefit for manually setting -1 vs manually
> > > > setting
> > > > > > the
> > > > > > > > value that transactional.id.expiration.ms has?
> > > > > > > >
> > > > > > > > Let me know your thoughts.
> > > > > > > > Thanks,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> > ismael@juma.me.uk>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for having 1 day as the default and for including this
> > > change
> > > > in
> > > > > > the
> > > > > > > > > release notes.
> > > > > > > > >
> > > > > > > > > Ismael
> > > > > > > > >
> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > > > > > <jason@confluent.io.invalid
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I don't think a major release is a requirement for a
> > default
> > > > > change
> > > > > > > for
> > > > > > > > > > what it's worth. I do appreciate that there is a
> preference
> > > for
> > > > > not
> > > > > > > > > rocking
> > > > > > > > > > the boat though. For a little bit of background here, the
> > > > problem
> > > > > > we
> > > > > > > > > > have encountered in production since the idempotent
> > producer
> > > > > became
> > > > > > > the
> > > > > > > > > > default is OOM errors due to huge numbers of producerIds
> > that
> > > > had
> > > > > > to
> > > > > > > be
> > > > > > > > > > retained in the cache for 7 days. It is hard to prevent
> use
> > > > cases
> > > > > > > from
> > > > > > > > > > emerging where producers are used and discarded rapidly.
> We
> > > > will
> > > > > be
> > > > > > > > > using a
> > > > > > > > > > lower value for sure, but it would also be nice to reduce
> > the
> > > > > > > > likelihood
> > > > > > > > > > for the community to see this problem. The benefit of the
> > > > caching
> > > > > > > > > > diminishes quickly over time since it is primarily meant
> to
> > > > > handle
> > > > > > > > > producer
> > > > > > > > > > retry windows. I do not think there is much difference
> > > between
> > > > 1
> > > > > > days
> > > > > > > > > and 7
> > > > > > > > > > days from an application perspective, but it is a huge
> > > > difference
> > > > > > for
> > > > > > > > the
> > > > > > > > > > broker's memory usage.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jason
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > > > sagarmeansocean@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks Justine for the KIP. I think it might be better
> to
> > > > > > document
> > > > > > > > the
> > > > > > > > > > > correlation between the new config and
> > delivery.timeout.ms
> > > > in
> > > > > > the
> > > > > > > > > Public
> > > > > > > > > > > Interfaces Description.
> > > > > > > > > > >
> > > > > > > > > > > Also, I agree with Luke that for now setting a default
> to
> > > -1
> > > > > > should
> > > > > > > > be
> > > > > > > > > > > good. We can look to switch to 1 day with major
> release.
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > > Sagar.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> > > showuon@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP.
> > > > > > > > > > > > I agree with you that we should try our best to keep
> > > > backward
> > > > > > > > > > > > compatibility, although our intention is to have
> lower
> > > > > producer
> > > > > > > id
> > > > > > > > > > > > expiration timeout.
> > > > > > > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > > > > > > Maybe we change the default to 1 day in next major
> > > release
> > > > > > (4.0)?
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you.
> > > > > > > > > > > > Luke
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for taking a look Jason!
> > > > > > > > > > > > >
> > > > > > > > > > > > > I wondered if we wanted to have a smaller default
> but
> > > > > wasn't
> > > > > > > sure
> > > > > > > > > > about
> > > > > > > > > > > > the
> > > > > > > > > > > > > compatibility story -- especially since there is
> the
> > > > chance
> > > > > > for
> > > > > > > > > > > producer
> > > > > > > > > > > > > IDs to expire silently.
> > > > > > > > > > > > > I do think that 1 day is fairly reasonable. If I
> > don't
> > > > hear
> > > > > > any
> > > > > > > > > > > > conflicting
> > > > > > > > > > > > > opinions I can go ahead and update the default.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Justine
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > > > > > > configurations,
> > > > > > > > I
> > > > > > > > > > > think
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > is a good change. Combining these timeout
> behaviors
> > > > into
> > > > > a
> > > > > > > > single
> > > > > > > > > > > > > > configuration was definitely a mistake, but we
> > didn't
> > > > > > > > anticipate
> > > > > > > > > > the
> > > > > > > > > > > > > > problem with the producer id cache. I do wonder
> if
> > we
> > > > can
> > > > > > > make
> > > > > > > > > the
> > > > > > > > > > > > > default
> > > > > > > > > > > > > > a bit lower to reduce the chances that users will
> > hit
> > > > the
> > > > > > > same
> > > > > > > > > > memory
> > > > > > > > > > > > > > issues we have seen. After decoupling this
> > > > configuration
> > > > > > from
> > > > > > > > > > > > > > transactional.id.expiration.ms, the new timeout
> > just
> > > > > needs
> > > > > > > to
> > > > > > > > > > cover
> > > > > > > > > > > > the
> > > > > > > > > > > > > > longest duration that a producer might be
> retrying
> > > the
> > > > > same
> > > > > > > > > Produce
> > > > > > > > > > > > > > request. 7 days seems too high. Although I think
> it
> > > > could
> > > > > > go
> > > > > > > a
> > > > > > > > > fair
> > > > > > > > > > > > even
> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place to
> > start?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Jason
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hey Bill,
> > > > > > > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > > > > > > transactional.id.expiration.ms would also be
> > over
> > > > the
> > > > > > > > delivery
> > > > > > > > > > > > > timeout.
> > > > > > > > > > > > > > :)
> > > > > > > > > > > > > > > Thanks for the +1!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Justine
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > > > > > > bbejeck@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I just took another look at the KIP, and I
> > > realize
> > > > my
> > > > > > > > > > > > > > question/suggestion
> > > > > > > > > > > > > > > > about default values has already been
> addressed
> > > in
> > > > > the
> > > > > > > > > > > > > `Compatibility`
> > > > > > > > > > > > > > > > section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -Bill
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > > > > > > bbejeck@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the well written KIP, this looks
> > > like
> > > > it
> > > > > > > will
> > > > > > > > > be a
> > > > > > > > > > > > > useful
> > > > > > > > > > > > > > > > > addition.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Overall the KIP looks good to me, I have
> one
> > > > > > > > > > question/comment.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > You mentioned that setting the `
> > > > > > > > producer.id.expiration.ms`
> > > > > > > > > > > less
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > delivery timeout could lead to duplicates,
> > > which
> > > > > > makes
> > > > > > > > > sense.
> > > > > > > > > > > To
> > > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > > avoid this situation, do we want to
> consider
> > a
> > > > > > default
> > > > > > > > > value
> > > > > > > > > > > that
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Bill
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine
> > Olshan
> > > > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> Hey all!
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> I'd like to start a discussion on my
> > proposal
> > > to
> > > > > > > > separate
> > > > > > > > > > > > > time-based
> > > > > > > > > > > > > > > > >> producer ID expiration from transactional
> ID
> > > > > > > expiration
> > > > > > > > by
> > > > > > > > > > > > > > > introducing a
> > > > > > > > > > > > > > > > >> new configuration.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> The KIP Is pretty small and simple, but
> will
> > > be
> > > > > > > helpful
> > > > > > > > in
> > > > > > > > > > > > > > controlling
> > > > > > > > > > > > > > > > >> memory usage in brokers -- especially now
> > that
> > > > by
> > > > > > > > default
> > > > > > > > > > > > > producers
> > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Please take a look and leave any comments
> > you
> > > > may
> > > > > > > have!
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> KIP:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > > > > > > >> JIRA:
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Thanks!
> > > > > > > > > > > > > > > > >> Justine
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Tom Bentley <tb...@redhat.com>.
Hi Justine,

That all seems reasonable to me, thanks!

On Wed, 3 Aug 2022 at 19:14, Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hi Tom and Ismael,
>
> 1. Yes, there are definitely many ways to improve this issue and I plan to
> write followup KIPs to address some of the larger changes.
> Just wanted to get this simple fix in as a short term measure to prevent
> issues with too many producer IDs in the cache. Stay tuned :)
>
> 2. I did have some offline discussion about informing the client. I think
> for this specific KIP the default behavior in practice should not change
> enough to require this information to go back to the client. In other
> words, a reasonable configuration should not regress behavior. However,
> with the further changes I mention in 1, perhaps this is something we want
> to do. And yes -- unfortunately the current state of Kafka is no longer
> totally consistent with KIP-98. This is something we probably want to
> clarify in the future.
>
> 3. I will update the config to mention it is not dynamic. I think since the
> transactional id configuration is read-only, this should be too.
>
> 4. I can update this wording.
>
> 5. I think there are definitely benefits to the name `
> idempotent.pid.expiration.ms` but there are other ways this could cause
> confusion. And to be clear -- the configuration can expire a producer ID
> for a transactional producer as long as there isn't an ongoing transaction.
>
> Let me know if you have any questions and thanks for taking a look!
>
> Justine
>
> On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Regarding 1, more can certainly be done, but I think it would be
> > complementary. As such, I think this KIP stands on its own and additional
> > improvements can be handled via future KIPs (unless Justine wants to
> > combine things, of course).
> >
> > Ismael
> >
> > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP! I can see that this is a pragmatic attempt to
> > address a
> > > nasty problem. I have a few questions:
> > >
> > > 1. The KIP makes the problem significantly harder to trigger, but
> doesn't
> > > eliminate it entirely. How confident are you that it will be sufficient
> > in
> > > practice? We can point to applications which are creating idempotent
> > > producers at a high rate and say they're broken, but that doesn't do
> > > anything to defend the broker from an interaction pattern that differs
> > only
> > > in rate from a "good application". Did you consider a new quota to
> limit
> > > the rate at which a (principal, clientId) can allocate new PIDs?
> > >
> > > 2. The KIP contains this sentence: "when an idempotent producer’s ID
> > > expires, it silently loses its idempotency guarantees." That's at odds
> > with
> > > my reading of "PID expiration" in the KIP-98 design[1], but it does
> seem
> > > consistent with a (brief!) look at the code. I accept that the risk
> > should
> > > be minimal so long as the expiration time is > the producer's delivery
> > > timeout, but it would still be nice if we could detect this situation
> and
> > > return an error to the client. Is there a reason for the apparent
> > deviation
> > > from KIP-98 (or am I misreading the code?)
> > >
> > > 3. Could the KIP be explicit on whether the new config will be
> > dynamically
> > > changeable?
> > >
> > > 4. The description of producer.id.expiration.ms mentions the
> > > ProducerStateManager, which will mean nothing to a normal user. We
> could
> > > probably change it to "a topic partition leader" without loss of
> meaning.
> > >
> > > 5. The description also says "Producer IDs will not expire while a
> > > transaction associated to them is still ongoing." To me this suggests
> > that
> > > a more intuitive name for this config (from the user PoV) would include
> > > "idempotent", since it does not cover the transactional case. (I would
> > > suggest "idempotent.pid.expiration.ms" (c.f.
> > > transactional.id.expiration.ms),
> > > but the distinction between "id" and "pid" is easily missed–even if
> it's
> > > technically correct–so I'm not sure it's better than what you're
> > > proposing). I only mention it in case it prompts someone else to find a
> > > better name.
> > >
> > > Thanks again,
> > >
> > > Tom
> > >
> > > [1]:
> > >
> > >
> >
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> > >
> > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> <jolshan@confluent.io.invalid
> > >
> > > wrote:
> > >
> > > > I've updated the KIP to make the new minimum value 1 and remove the
> -1
> > > > configuration.
> > > > I've also added the low priority to the configuration and edited the
> > > > description as Ismael mentioned.
> > > >
> > > > I'm thinking about bringing this KIP to a vote soon! Let me know if
> > there
> > > > are any other comments or questions.
> > > >
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > > <jason@confluent.io.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > I agree with Ismael that we should remove -1. I think we tend to
> view
> > > the
> > > > > coupling of these behaviors into a single configuration as a
> mistake,
> > > so
> > > > > it's a little odd to keep it (even if in a weakened form).
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > > >
> > > > > > I would remove -1 altogether. Two more comments:
> > > > > >
> > > > > > 1. The current description kind of leads people towards aligning
> > the
> > > > > config
> > > > > > with delivery.timeout.ms. Is that what we want? We could say it
> > > should
> > > > > be
> > > > > > higher than delivery.timeout.ms but indicate that the default is
> > > > usually
> > > > > > fine. The main reason to reduce it would be to save memory, I
> > guess.
> > > > > > 2. Each config has a priority, we should specify it for this one.
> > I'm
> > > > > > assuming it will be "low".
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > > > > <jo...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've updated the KIP to include the new default of 1 day and
> > > > > information
> > > > > > > about -1 in the description of the config.
> > > > > > > I wonder though if including -1 makes sense now that it is not
> > the
> > > > > > default
> > > > > > > value. Is there a benefit for manually setting -1 vs manually
> > > setting
> > > > > the
> > > > > > > value that transactional.id.expiration.ms has?
> > > > > > >
> > > > > > > Let me know your thoughts.
> > > > > > > Thanks,
> > > > > > > Justine
> > > > > > >
> > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> ismael@juma.me.uk>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 for having 1 day as the default and for including this
> > change
> > > in
> > > > > the
> > > > > > > > release notes.
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > > > > <jason@confluent.io.invalid
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I don't think a major release is a requirement for a
> default
> > > > change
> > > > > > for
> > > > > > > > > what it's worth. I do appreciate that there is a preference
> > for
> > > > not
> > > > > > > > rocking
> > > > > > > > > the boat though. For a little bit of background here, the
> > > problem
> > > > > we
> > > > > > > > > have encountered in production since the idempotent
> producer
> > > > became
> > > > > > the
> > > > > > > > > default is OOM errors due to huge numbers of producerIds
> that
> > > had
> > > > > to
> > > > > > be
> > > > > > > > > retained in the cache for 7 days. It is hard to prevent use
> > > cases
> > > > > > from
> > > > > > > > > emerging where producers are used and discarded rapidly. We
> > > will
> > > > be
> > > > > > > > using a
> > > > > > > > > lower value for sure, but it would also be nice to reduce
> the
> > > > > > > likelihood
> > > > > > > > > for the community to see this problem. The benefit of the
> > > caching
> > > > > > > > > diminishes quickly over time since it is primarily meant to
> > > > handle
> > > > > > > > producer
> > > > > > > > > retry windows. I do not think there is much difference
> > between
> > > 1
> > > > > days
> > > > > > > > and 7
> > > > > > > > > days from an application perspective, but it is a huge
> > > difference
> > > > > for
> > > > > > > the
> > > > > > > > > broker's memory usage.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Jason
> > > > > > > > >
> > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > > sagarmeansocean@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Justine for the KIP. I think it might be better to
> > > > > document
> > > > > > > the
> > > > > > > > > > correlation between the new config and
> delivery.timeout.ms
> > > in
> > > > > the
> > > > > > > > Public
> > > > > > > > > > Interfaces Description.
> > > > > > > > > >
> > > > > > > > > > Also, I agree with Luke that for now setting a default to
> > -1
> > > > > should
> > > > > > > be
> > > > > > > > > > good. We can look to switch to 1 day with major release.
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > > Sagar.
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> > showuon@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Justine,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP.
> > > > > > > > > > > I agree with you that we should try our best to keep
> > > backward
> > > > > > > > > > > compatibility, although our intention is to have lower
> > > > producer
> > > > > > id
> > > > > > > > > > > expiration timeout.
> > > > > > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > > > > > Maybe we change the default to 1 day in next major
> > release
> > > > > (4.0)?
> > > > > > > > > > >
> > > > > > > > > > > Thank you.
> > > > > > > > > > > Luke
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for taking a look Jason!
> > > > > > > > > > > >
> > > > > > > > > > > > I wondered if we wanted to have a smaller default but
> > > > wasn't
> > > > > > sure
> > > > > > > > > about
> > > > > > > > > > > the
> > > > > > > > > > > > compatibility story -- especially since there is the
> > > chance
> > > > > for
> > > > > > > > > > producer
> > > > > > > > > > > > IDs to expire silently.
> > > > > > > > > > > > I do think that 1 day is fairly reasonable. If I
> don't
> > > hear
> > > > > any
> > > > > > > > > > > conflicting
> > > > > > > > > > > > opinions I can go ahead and update the default.
> > > > > > > > > > > >
> > > > > > > > > > > > Justine
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > > > > > configurations,
> > > > > > > I
> > > > > > > > > > think
> > > > > > > > > > > > this
> > > > > > > > > > > > > is a good change. Combining these timeout behaviors
> > > into
> > > > a
> > > > > > > single
> > > > > > > > > > > > > configuration was definitely a mistake, but we
> didn't
> > > > > > > anticipate
> > > > > > > > > the
> > > > > > > > > > > > > problem with the producer id cache. I do wonder if
> we
> > > can
> > > > > > make
> > > > > > > > the
> > > > > > > > > > > > default
> > > > > > > > > > > > > a bit lower to reduce the chances that users will
> hit
> > > the
> > > > > > same
> > > > > > > > > memory
> > > > > > > > > > > > > issues we have seen. After decoupling this
> > > configuration
> > > > > from
> > > > > > > > > > > > > transactional.id.expiration.ms, the new timeout
> just
> > > > needs
> > > > > > to
> > > > > > > > > cover
> > > > > > > > > > > the
> > > > > > > > > > > > > longest duration that a producer might be retrying
> > the
> > > > same
> > > > > > > > Produce
> > > > > > > > > > > > > request. 7 days seems too high. Although I think it
> > > could
> > > > > go
> > > > > > a
> > > > > > > > fair
> > > > > > > > > > > even
> > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place to
> start?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Jason
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hey Bill,
> > > > > > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > > > > > transactional.id.expiration.ms would also be
> over
> > > the
> > > > > > > delivery
> > > > > > > > > > > > timeout.
> > > > > > > > > > > > > :)
> > > > > > > > > > > > > > Thanks for the +1!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Justine
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > > > > > bbejeck@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I just took another look at the KIP, and I
> > realize
> > > my
> > > > > > > > > > > > > question/suggestion
> > > > > > > > > > > > > > > about default values has already been addressed
> > in
> > > > the
> > > > > > > > > > > > `Compatibility`
> > > > > > > > > > > > > > > section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -Bill
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > > > > > bbejeck@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the well written KIP, this looks
> > like
> > > it
> > > > > > will
> > > > > > > > be a
> > > > > > > > > > > > useful
> > > > > > > > > > > > > > > > addition.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Overall the KIP looks good to me, I have one
> > > > > > > > > question/comment.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You mentioned that setting the `
> > > > > > > producer.id.expiration.ms`
> > > > > > > > > > less
> > > > > > > > > > > > than
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > delivery timeout could lead to duplicates,
> > which
> > > > > makes
> > > > > > > > sense.
> > > > > > > > > > To
> > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > avoid this situation, do we want to consider
> a
> > > > > default
> > > > > > > > value
> > > > > > > > > > that
> > > > > > > > > > > > is
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Bill
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine
> Olshan
> > > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >> Hey all!
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> I'd like to start a discussion on my
> proposal
> > to
> > > > > > > separate
> > > > > > > > > > > > time-based
> > > > > > > > > > > > > > > >> producer ID expiration from transactional ID
> > > > > > expiration
> > > > > > > by
> > > > > > > > > > > > > > introducing a
> > > > > > > > > > > > > > > >> new configuration.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> The KIP Is pretty small and simple, but will
> > be
> > > > > > helpful
> > > > > > > in
> > > > > > > > > > > > > controlling
> > > > > > > > > > > > > > > >> memory usage in brokers -- especially now
> that
> > > by
> > > > > > > default
> > > > > > > > > > > > producers
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Please take a look and leave any comments
> you
> > > may
> > > > > > have!
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> KIP:
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > > > > > >> JIRA:
> > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Thanks!
> > > > > > > > > > > > > > > >> Justine
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi Tom and Ismael,

1. Yes, there are definitely many ways to improve this issue and I plan to
write followup KIPs to address some of the larger changes.
Just wanted to get this simple fix in as a short term measure to prevent
issues with too many producer IDs in the cache. Stay tuned :)

2. I did have some offline discussion about informing the client. I think
for this specific KIP the default behavior in practice should not change
enough to require this information to go back to the client. In other
words, a reasonable configuration should not regress behavior. However,
with the further changes I mention in 1, perhaps this is something we want
to do. And yes -- unfortunately the current state of Kafka is no longer
totally consistent with KIP-98. This is something we probably want to
clarify in the future.

3. I will update the config to mention it is not dynamic. I think since the
transactional id configuration is read-only, this should be too.

4. I can update this wording.

5. I think there are definitely benefits to the name `
idempotent.pid.expiration.ms` but there are other ways this could cause
confusion. And to be clear -- the configuration can expire a producer ID
for a transactional producer as long as there isn't an ongoing transaction.

Let me know if you have any questions and thanks for taking a look!

Justine

On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <is...@juma.me.uk> wrote:

> Regarding 1, more can certainly be done, but I think it would be
> complementary. As such, I think this KIP stands on its own and additional
> improvements can be handled via future KIPs (unless Justine wants to
> combine things, of course).
>
> Ismael
>
> On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP! I can see that this is a pragmatic attempt to
> address a
> > nasty problem. I have a few questions:
> >
> > 1. The KIP makes the problem significantly harder to trigger, but doesn't
> > eliminate it entirely. How confident are you that it will be sufficient
> in
> > practice? We can point to applications which are creating idempotent
> > producers at a high rate and say they're broken, but that doesn't do
> > anything to defend the broker from an interaction pattern that differs
> only
> > in rate from a "good application". Did you consider a new quota to limit
> > the rate at which a (principal, clientId) can allocate new PIDs?
> >
> > 2. The KIP contains this sentence: "when an idempotent producer’s ID
> > expires, it silently loses its idempotency guarantees." That's at odds
> with
> > my reading of "PID expiration" in the KIP-98 design[1], but it does seem
> > consistent with a (brief!) look at the code. I accept that the risk
> should
> > be minimal so long as the expiration time is > the producer's delivery
> > timeout, but it would still be nice if we could detect this situation and
> > return an error to the client. Is there a reason for the apparent
> deviation
> > from KIP-98 (or am I misreading the code?)
> >
> > 3. Could the KIP be explicit on whether the new config will be
> dynamically
> > changeable?
> >
> > 4. The description of producer.id.expiration.ms mentions the
> > ProducerStateManager, which will mean nothing to a normal user. We could
> > probably change it to "a topic partition leader" without loss of meaning.
> >
> > 5. The description also says "Producer IDs will not expire while a
> > transaction associated to them is still ongoing." To me this suggests
> that
> > a more intuitive name for this config (from the user PoV) would include
> > "idempotent", since it does not cover the transactional case. (I would
> > suggest "idempotent.pid.expiration.ms" (c.f.
> > transactional.id.expiration.ms),
> > but the distinction between "id" and "pid" is easily missed–even if it's
> > technically correct–so I'm not sure it's better than what you're
> > proposing). I only mention it in case it prompts someone else to find a
> > better name.
> >
> > Thanks again,
> >
> > Tom
> >
> > [1]:
> >
> >
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> >
> > On Tue, 2 Aug 2022 at 22:00, Justine Olshan <jolshan@confluent.io.invalid
> >
> > wrote:
> >
> > > I've updated the KIP to make the new minimum value 1 and remove the -1
> > > configuration.
> > > I've also added the low priority to the configuration and edited the
> > > description as Ismael mentioned.
> > >
> > > I'm thinking about bringing this KIP to a vote soon! Let me know if
> there
> > > are any other comments or questions.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > <jason@confluent.io.invalid
> > > >
> > > wrote:
> > >
> > > > I agree with Ismael that we should remove -1. I think we tend to view
> > the
> > > > coupling of these behaviors into a single configuration as a mistake,
> > so
> > > > it's a little odd to keep it (even if in a weakened form).
> > > >
> > > > -Jason
> > > >
> > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> > > >
> > > > > I would remove -1 altogether. Two more comments:
> > > > >
> > > > > 1. The current description kind of leads people towards aligning
> the
> > > > config
> > > > > with delivery.timeout.ms. Is that what we want? We could say it
> > should
> > > > be
> > > > > higher than delivery.timeout.ms but indicate that the default is
> > > usually
> > > > > fine. The main reason to reduce it would be to save memory, I
> guess.
> > > > > 2. Each config has a priority, we should specify it for this one.
> I'm
> > > > > assuming it will be "low".
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > > > <jo...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I've updated the KIP to include the new default of 1 day and
> > > > information
> > > > > > about -1 in the description of the config.
> > > > > > I wonder though if including -1 makes sense now that it is not
> the
> > > > > default
> > > > > > value. Is there a benefit for manually setting -1 vs manually
> > setting
> > > > the
> > > > > > value that transactional.id.expiration.ms has?
> > > > > >
> > > > > > Let me know your thoughts.
> > > > > > Thanks,
> > > > > > Justine
> > > > > >
> > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk>
> > > > wrote:
> > > > > >
> > > > > > > +1 for having 1 day as the default and for including this
> change
> > in
> > > > the
> > > > > > > release notes.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > > > <jason@confluent.io.invalid
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I don't think a major release is a requirement for a default
> > > change
> > > > > for
> > > > > > > > what it's worth. I do appreciate that there is a preference
> for
> > > not
> > > > > > > rocking
> > > > > > > > the boat though. For a little bit of background here, the
> > problem
> > > > we
> > > > > > > > have encountered in production since the idempotent producer
> > > became
> > > > > the
> > > > > > > > default is OOM errors due to huge numbers of producerIds that
> > had
> > > > to
> > > > > be
> > > > > > > > retained in the cache for 7 days. It is hard to prevent use
> > cases
> > > > > from
> > > > > > > > emerging where producers are used and discarded rapidly. We
> > will
> > > be
> > > > > > > using a
> > > > > > > > lower value for sure, but it would also be nice to reduce the
> > > > > > likelihood
> > > > > > > > for the community to see this problem. The benefit of the
> > caching
> > > > > > > > diminishes quickly over time since it is primarily meant to
> > > handle
> > > > > > > producer
> > > > > > > > retry windows. I do not think there is much difference
> between
> > 1
> > > > days
> > > > > > > and 7
> > > > > > > > days from an application perspective, but it is a huge
> > difference
> > > > for
> > > > > > the
> > > > > > > > broker's memory usage.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jason
> > > > > > > >
> > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > sagarmeansocean@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Justine for the KIP. I think it might be better to
> > > > document
> > > > > > the
> > > > > > > > > correlation between the new config and delivery.timeout.ms
> > in
> > > > the
> > > > > > > Public
> > > > > > > > > Interfaces Description.
> > > > > > > > >
> > > > > > > > > Also, I agree with Luke that for now setting a default to
> -1
> > > > should
> > > > > > be
> > > > > > > > > good. We can look to switch to 1 day with major release.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Sagar.
> > > > > > > > >
> > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> showuon@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Justine,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP.
> > > > > > > > > > I agree with you that we should try our best to keep
> > backward
> > > > > > > > > > compatibility, although our intention is to have lower
> > > producer
> > > > > id
> > > > > > > > > > expiration timeout.
> > > > > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > > > > Maybe we change the default to 1 day in next major
> release
> > > > (4.0)?
> > > > > > > > > >
> > > > > > > > > > Thank you.
> > > > > > > > > > Luke
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for taking a look Jason!
> > > > > > > > > > >
> > > > > > > > > > > I wondered if we wanted to have a smaller default but
> > > wasn't
> > > > > sure
> > > > > > > > about
> > > > > > > > > > the
> > > > > > > > > > > compatibility story -- especially since there is the
> > chance
> > > > for
> > > > > > > > > producer
> > > > > > > > > > > IDs to expire silently.
> > > > > > > > > > > I do think that 1 day is fairly reasonable. If I don't
> > hear
> > > > any
> > > > > > > > > > conflicting
> > > > > > > > > > > opinions I can go ahead and update the default.
> > > > > > > > > > >
> > > > > > > > > > > Justine
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > > > > configurations,
> > > > > > I
> > > > > > > > > think
> > > > > > > > > > > this
> > > > > > > > > > > > is a good change. Combining these timeout behaviors
> > into
> > > a
> > > > > > single
> > > > > > > > > > > > configuration was definitely a mistake, but we didn't
> > > > > > anticipate
> > > > > > > > the
> > > > > > > > > > > > problem with the producer id cache. I do wonder if we
> > can
> > > > > make
> > > > > > > the
> > > > > > > > > > > default
> > > > > > > > > > > > a bit lower to reduce the chances that users will hit
> > the
> > > > > same
> > > > > > > > memory
> > > > > > > > > > > > issues we have seen. After decoupling this
> > configuration
> > > > from
> > > > > > > > > > > > transactional.id.expiration.ms, the new timeout just
> > > needs
> > > > > to
> > > > > > > > cover
> > > > > > > > > > the
> > > > > > > > > > > > longest duration that a producer might be retrying
> the
> > > same
> > > > > > > Produce
> > > > > > > > > > > > request. 7 days seems too high. Although I think it
> > could
> > > > go
> > > > > a
> > > > > > > fair
> > > > > > > > > > even
> > > > > > > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Jason
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey Bill,
> > > > > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > > > > transactional.id.expiration.ms would also be over
> > the
> > > > > > delivery
> > > > > > > > > > > timeout.
> > > > > > > > > > > > :)
> > > > > > > > > > > > > Thanks for the +1!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Justine
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > > > > bbejeck@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I just took another look at the KIP, and I
> realize
> > my
> > > > > > > > > > > > question/suggestion
> > > > > > > > > > > > > > about default values has already been addressed
> in
> > > the
> > > > > > > > > > > `Compatibility`
> > > > > > > > > > > > > > section.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -Bill
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > > > > bbejeck@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the well written KIP, this looks
> like
> > it
> > > > > will
> > > > > > > be a
> > > > > > > > > > > useful
> > > > > > > > > > > > > > > addition.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Overall the KIP looks good to me, I have one
> > > > > > > > question/comment.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You mentioned that setting the `
> > > > > > producer.id.expiration.ms`
> > > > > > > > > less
> > > > > > > > > > > than
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > delivery timeout could lead to duplicates,
> which
> > > > makes
> > > > > > > sense.
> > > > > > > > > To
> > > > > > > > > > > > help
> > > > > > > > > > > > > > > avoid this situation, do we want to consider a
> > > > default
> > > > > > > value
> > > > > > > > > that
> > > > > > > > > > > is
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Bill
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> Hey all!
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> I'd like to start a discussion on my proposal
> to
> > > > > > separate
> > > > > > > > > > > time-based
> > > > > > > > > > > > > > >> producer ID expiration from transactional ID
> > > > > expiration
> > > > > > by
> > > > > > > > > > > > > introducing a
> > > > > > > > > > > > > > >> new configuration.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> The KIP Is pretty small and simple, but will
> be
> > > > > helpful
> > > > > > in
> > > > > > > > > > > > controlling
> > > > > > > > > > > > > > >> memory usage in brokers -- especially now that
> > by
> > > > > > default
> > > > > > > > > > > producers
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Please take a look and leave any comments you
> > may
> > > > > have!
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> KIP:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > > > > >> JIRA:
> > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Thanks!
> > > > > > > > > > > > > > >> Justine
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Ismael Juma <is...@juma.me.uk>.
Regarding 1, more can certainly be done, but I think it would be
complementary. As such, I think this KIP stands on its own and additional
improvements can be handled via future KIPs (unless Justine wants to
combine things, of course).

Ismael

On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Justine,
>
> Thanks for the KIP! I can see that this is a pragmatic attempt to address a
> nasty problem. I have a few questions:
>
> 1. The KIP makes the problem significantly harder to trigger, but doesn't
> eliminate it entirely. How confident are you that it will be sufficient in
> practice? We can point to applications which are creating idempotent
> producers at a high rate and say they're broken, but that doesn't do
> anything to defend the broker from an interaction pattern that differs only
> in rate from a "good application". Did you consider a new quota to limit
> the rate at which a (principal, clientId) can allocate new PIDs?
>
> 2. The KIP contains this sentence: "when an idempotent producer’s ID
> expires, it silently loses its idempotency guarantees." That's at odds with
> my reading of "PID expiration" in the KIP-98 design[1], but it does seem
> consistent with a (brief!) look at the code. I accept that the risk should
> be minimal so long as the expiration time is > the producer's delivery
> timeout, but it would still be nice if we could detect this situation and
> return an error to the client. Is there a reason for the apparent deviation
> from KIP-98 (or am I misreading the code?)
>
> 3. Could the KIP be explicit on whether the new config will be dynamically
> changeable?
>
> 4. The description of producer.id.expiration.ms mentions the
> ProducerStateManager, which will mean nothing to a normal user. We could
> probably change it to "a topic partition leader" without loss of meaning.
>
> 5. The description also says "Producer IDs will not expire while a
> transaction associated to them is still ongoing." To me this suggests that
> a more intuitive name for this config (from the user PoV) would include
> "idempotent", since it does not cover the transactional case. (I would
> suggest "idempotent.pid.expiration.ms" (c.f.
> transactional.id.expiration.ms),
> but the distinction between "id" and "pid" is easily missed–even if it's
> technically correct–so I'm not sure it's better than what you're
> proposing). I only mention it in case it prompts someone else to find a
> better name.
>
> Thanks again,
>
> Tom
>
> [1]:
>
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
>
> On Tue, 2 Aug 2022 at 22:00, Justine Olshan <jo...@confluent.io.invalid>
> wrote:
>
> > I've updated the KIP to make the new minimum value 1 and remove the -1
> > configuration.
> > I've also added the low priority to the configuration and edited the
> > description as Ismael mentioned.
> >
> > I'm thinking about bringing this KIP to a vote soon! Let me know if there
> > are any other comments or questions.
> >
> > Thanks,
> > Justine
> >
> > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> <jason@confluent.io.invalid
> > >
> > wrote:
> >
> > > I agree with Ismael that we should remove -1. I think we tend to view
> the
> > > coupling of these behaviors into a single configuration as a mistake,
> so
> > > it's a little odd to keep it (even if in a weakened form).
> > >
> > > -Jason
> > >
> > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk> wrote:
> > >
> > > > I would remove -1 altogether. Two more comments:
> > > >
> > > > 1. The current description kind of leads people towards aligning the
> > > config
> > > > with delivery.timeout.ms. Is that what we want? We could say it
> should
> > > be
> > > > higher than delivery.timeout.ms but indicate that the default is
> > usually
> > > > fine. The main reason to reduce it would be to save memory, I guess.
> > > > 2. Each config has a priority, we should specify it for this one. I'm
> > > > assuming it will be "low".
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > > <jo...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've updated the KIP to include the new default of 1 day and
> > > information
> > > > > about -1 in the description of the config.
> > > > > I wonder though if including -1 makes sense now that it is not the
> > > > default
> > > > > value. Is there a benefit for manually setting -1 vs manually
> setting
> > > the
> > > > > value that transactional.id.expiration.ms has?
> > > > >
> > > > > Let me know your thoughts.
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > +1 for having 1 day as the default and for including this change
> in
> > > the
> > > > > > release notes.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > > <jason@confluent.io.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I don't think a major release is a requirement for a default
> > change
> > > > for
> > > > > > > what it's worth. I do appreciate that there is a preference for
> > not
> > > > > > rocking
> > > > > > > the boat though. For a little bit of background here, the
> problem
> > > we
> > > > > > > have encountered in production since the idempotent producer
> > became
> > > > the
> > > > > > > default is OOM errors due to huge numbers of producerIds that
> had
> > > to
> > > > be
> > > > > > > retained in the cache for 7 days. It is hard to prevent use
> cases
> > > > from
> > > > > > > emerging where producers are used and discarded rapidly. We
> will
> > be
> > > > > > using a
> > > > > > > lower value for sure, but it would also be nice to reduce the
> > > > > likelihood
> > > > > > > for the community to see this problem. The benefit of the
> caching
> > > > > > > diminishes quickly over time since it is primarily meant to
> > handle
> > > > > > producer
> > > > > > > retry windows. I do not think there is much difference between
> 1
> > > days
> > > > > > and 7
> > > > > > > days from an application perspective, but it is a huge
> difference
> > > for
> > > > > the
> > > > > > > broker's memory usage.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> sagarmeansocean@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Justine for the KIP. I think it might be better to
> > > document
> > > > > the
> > > > > > > > correlation between the new config and delivery.timeout.ms
> in
> > > the
> > > > > > Public
> > > > > > > > Interfaces Description.
> > > > > > > >
> > > > > > > > Also, I agree with Luke that for now setting a default to -1
> > > should
> > > > > be
> > > > > > > > good. We can look to switch to 1 day with major release.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Sagar.
> > > > > > > >
> > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <showuon@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP.
> > > > > > > > > I agree with you that we should try our best to keep
> backward
> > > > > > > > > compatibility, although our intention is to have lower
> > producer
> > > > id
> > > > > > > > > expiration timeout.
> > > > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > > > Maybe we change the default to 1 day in next major release
> > > (4.0)?
> > > > > > > > >
> > > > > > > > > Thank you.
> > > > > > > > > Luke
> > > > > > > > >
> > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for taking a look Jason!
> > > > > > > > > >
> > > > > > > > > > I wondered if we wanted to have a smaller default but
> > wasn't
> > > > sure
> > > > > > > about
> > > > > > > > > the
> > > > > > > > > > compatibility story -- especially since there is the
> chance
> > > for
> > > > > > > > producer
> > > > > > > > > > IDs to expire silently.
> > > > > > > > > > I do think that 1 day is fairly reasonable. If I don't
> hear
> > > any
> > > > > > > > > conflicting
> > > > > > > > > > opinions I can go ahead and update the default.
> > > > > > > > > >
> > > > > > > > > > Justine
> > > > > > > > > >
> > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Justine,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > > > configurations,
> > > > > I
> > > > > > > > think
> > > > > > > > > > this
> > > > > > > > > > > is a good change. Combining these timeout behaviors
> into
> > a
> > > > > single
> > > > > > > > > > > configuration was definitely a mistake, but we didn't
> > > > > anticipate
> > > > > > > the
> > > > > > > > > > > problem with the producer id cache. I do wonder if we
> can
> > > > make
> > > > > > the
> > > > > > > > > > default
> > > > > > > > > > > a bit lower to reduce the chances that users will hit
> the
> > > > same
> > > > > > > memory
> > > > > > > > > > > issues we have seen. After decoupling this
> configuration
> > > from
> > > > > > > > > > > transactional.id.expiration.ms, the new timeout just
> > needs
> > > > to
> > > > > > > cover
> > > > > > > > > the
> > > > > > > > > > > longest duration that a producer might be retrying the
> > same
> > > > > > Produce
> > > > > > > > > > > request. 7 days seems too high. Although I think it
> could
> > > go
> > > > a
> > > > > > fair
> > > > > > > > > even
> > > > > > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Jason
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Bill,
> > > > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > > > transactional.id.expiration.ms would also be over
> the
> > > > > delivery
> > > > > > > > > > timeout.
> > > > > > > > > > > :)
> > > > > > > > > > > > Thanks for the +1!
> > > > > > > > > > > >
> > > > > > > > > > > > Justine
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > > > bbejeck@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I just took another look at the KIP, and I realize
> my
> > > > > > > > > > > question/suggestion
> > > > > > > > > > > > > about default values has already been addressed in
> > the
> > > > > > > > > > `Compatibility`
> > > > > > > > > > > > > section.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Bill
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > > > bbejeck@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the well written KIP, this looks like
> it
> > > > will
> > > > > > be a
> > > > > > > > > > useful
> > > > > > > > > > > > > > addition.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Overall the KIP looks good to me, I have one
> > > > > > > question/comment.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > You mentioned that setting the `
> > > > > producer.id.expiration.ms`
> > > > > > > > less
> > > > > > > > > > than
> > > > > > > > > > > > the
> > > > > > > > > > > > > > delivery timeout could lead to duplicates, which
> > > makes
> > > > > > sense.
> > > > > > > > To
> > > > > > > > > > > help
> > > > > > > > > > > > > > avoid this situation, do we want to consider a
> > > default
> > > > > > value
> > > > > > > > that
> > > > > > > > > > is
> > > > > > > > > > > > the
> > > > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Bill
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Hey all!
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I'd like to start a discussion on my proposal to
> > > > > separate
> > > > > > > > > > time-based
> > > > > > > > > > > > > >> producer ID expiration from transactional ID
> > > > expiration
> > > > > by
> > > > > > > > > > > > introducing a
> > > > > > > > > > > > > >> new configuration.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> The KIP Is pretty small and simple, but will be
> > > > helpful
> > > > > in
> > > > > > > > > > > controlling
> > > > > > > > > > > > > >> memory usage in brokers -- especially now that
> by
> > > > > default
> > > > > > > > > > producers
> > > > > > > > > > > > are
> > > > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Please take a look and leave any comments you
> may
> > > > have!
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> KIP:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > > > >> JIRA:
> > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Thanks!
> > > > > > > > > > > > > >> Justine
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Tom Bentley <tb...@redhat.com>.
Hi Justine,

Thanks for the KIP! I can see that this is a pragmatic attempt to address a
nasty problem. I have a few questions:

1. The KIP makes the problem significantly harder to trigger, but doesn't
eliminate it entirely. How confident are you that it will be sufficient in
practice? We can point to applications which are creating idempotent
producers at a high rate and say they're broken, but that doesn't do
anything to defend the broker from an interaction pattern that differs only
in rate from a "good application". Did you consider a new quota to limit
the rate at which a (principal, clientId) can allocate new PIDs?

2. The KIP contains this sentence: "when an idempotent producer’s ID
expires, it silently loses its idempotency guarantees." That's at odds with
my reading of "PID expiration" in the KIP-98 design[1], but it does seem
consistent with a (brief!) look at the code. I accept that the risk should
be minimal so long as the expiration time is > the producer's delivery
timeout, but it would still be nice if we could detect this situation and
return an error to the client. Is there a reason for the apparent deviation
from KIP-98 (or am I misreading the code?)

3. Could the KIP be explicit on whether the new config will be dynamically
changeable?

4. The description of producer.id.expiration.ms mentions the
ProducerStateManager, which will mean nothing to a normal user. We could
probably change it to "a topic partition leader" without loss of meaning.

5. The description also says "Producer IDs will not expire while a
transaction associated to them is still ongoing." To me this suggests that
a more intuitive name for this config (from the user PoV) would include
"idempotent", since it does not cover the transactional case. (I would
suggest "idempotent.pid.expiration.ms" (c.f. transactional.id.expiration.ms),
but the distinction between "id" and "pid" is easily missed–even if it's
technically correct–so I'm not sure it's better than what you're
proposing). I only mention it in case it prompts someone else to find a
better name.

Thanks again,

Tom

[1]:
https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj

On Tue, 2 Aug 2022 at 22:00, Justine Olshan <jo...@confluent.io.invalid>
wrote:

> I've updated the KIP to make the new minimum value 1 and remove the -1
> configuration.
> I've also added the low priority to the configuration and edited the
> description as Ismael mentioned.
>
> I'm thinking about bringing this KIP to a vote soon! Let me know if there
> are any other comments or questions.
>
> Thanks,
> Justine
>
> On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson <jason@confluent.io.invalid
> >
> wrote:
>
> > I agree with Ismael that we should remove -1. I think we tend to view the
> > coupling of these behaviors into a single configuration as a mistake, so
> > it's a little odd to keep it (even if in a weakened form).
> >
> > -Jason
> >
> > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > I would remove -1 altogether. Two more comments:
> > >
> > > 1. The current description kind of leads people towards aligning the
> > config
> > > with delivery.timeout.ms. Is that what we want? We could say it should
> > be
> > > higher than delivery.timeout.ms but indicate that the default is
> usually
> > > fine. The main reason to reduce it would be to save memory, I guess.
> > > 2. Each config has a priority, we should specify it for this one. I'm
> > > assuming it will be "low".
> > >
> > > Ismael
> > >
> > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > <jo...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've updated the KIP to include the new default of 1 day and
> > information
> > > > about -1 in the description of the config.
> > > > I wonder though if including -1 makes sense now that it is not the
> > > default
> > > > value. Is there a benefit for manually setting -1 vs manually setting
> > the
> > > > value that transactional.id.expiration.ms has?
> > > >
> > > > Let me know your thoughts.
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > +1 for having 1 day as the default and for including this change in
> > the
> > > > > release notes.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > <jason@confluent.io.invalid
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I don't think a major release is a requirement for a default
> change
> > > for
> > > > > > what it's worth. I do appreciate that there is a preference for
> not
> > > > > rocking
> > > > > > the boat though. For a little bit of background here, the problem
> > we
> > > > > > have encountered in production since the idempotent producer
> became
> > > the
> > > > > > default is OOM errors due to huge numbers of producerIds that had
> > to
> > > be
> > > > > > retained in the cache for 7 days. It is hard to prevent use cases
> > > from
> > > > > > emerging where producers are used and discarded rapidly. We will
> be
> > > > > using a
> > > > > > lower value for sure, but it would also be nice to reduce the
> > > > likelihood
> > > > > > for the community to see this problem. The benefit of the caching
> > > > > > diminishes quickly over time since it is primarily meant to
> handle
> > > > > producer
> > > > > > retry windows. I do not think there is much difference between 1
> > days
> > > > > and 7
> > > > > > days from an application perspective, but it is a huge difference
> > for
> > > > the
> > > > > > broker's memory usage.
> > > > > >
> > > > > > Best,
> > > > > > Jason
> > > > > >
> > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <sagarmeansocean@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks Justine for the KIP. I think it might be better to
> > document
> > > > the
> > > > > > > correlation between the new config and delivery.timeout.ms in
> > the
> > > > > Public
> > > > > > > Interfaces Description.
> > > > > > >
> > > > > > > Also, I agree with Luke that for now setting a default to -1
> > should
> > > > be
> > > > > > > good. We can look to switch to 1 day with major release.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Sagar.
> > > > > > >
> > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > Thanks for the KIP.
> > > > > > > > I agree with you that we should try our best to keep backward
> > > > > > > > compatibility, although our intention is to have lower
> producer
> > > id
> > > > > > > > expiration timeout.
> > > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > > Maybe we change the default to 1 day in next major release
> > (4.0)?
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for taking a look Jason!
> > > > > > > > >
> > > > > > > > > I wondered if we wanted to have a smaller default but
> wasn't
> > > sure
> > > > > > about
> > > > > > > > the
> > > > > > > > > compatibility story -- especially since there is the chance
> > for
> > > > > > > producer
> > > > > > > > > IDs to expire silently.
> > > > > > > > > I do think that 1 day is fairly reasonable. If I don't hear
> > any
> > > > > > > > conflicting
> > > > > > > > > opinions I can go ahead and update the default.
> > > > > > > > >
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Justine,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > > configurations,
> > > > I
> > > > > > > think
> > > > > > > > > this
> > > > > > > > > > is a good change. Combining these timeout behaviors into
> a
> > > > single
> > > > > > > > > > configuration was definitely a mistake, but we didn't
> > > > anticipate
> > > > > > the
> > > > > > > > > > problem with the producer id cache. I do wonder if we can
> > > make
> > > > > the
> > > > > > > > > default
> > > > > > > > > > a bit lower to reduce the chances that users will hit the
> > > same
> > > > > > memory
> > > > > > > > > > issues we have seen. After decoupling this configuration
> > from
> > > > > > > > > > transactional.id.expiration.ms, the new timeout just
> needs
> > > to
> > > > > > cover
> > > > > > > > the
> > > > > > > > > > longest duration that a producer might be retrying the
> same
> > > > > Produce
> > > > > > > > > > request. 7 days seems too high. Although I think it could
> > go
> > > a
> > > > > fair
> > > > > > > > even
> > > > > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Jason
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Bill,
> > > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > > transactional.id.expiration.ms would also be over the
> > > > delivery
> > > > > > > > > timeout.
> > > > > > > > > > :)
> > > > > > > > > > > Thanks for the +1!
> > > > > > > > > > >
> > > > > > > > > > > Justine
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > > bbejeck@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > >
> > > > > > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > > > > > question/suggestion
> > > > > > > > > > > > about default values has already been addressed in
> the
> > > > > > > > > `Compatibility`
> > > > > > > > > > > > section.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > -Bill
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > > bbejeck@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the well written KIP, this looks like it
> > > will
> > > > > be a
> > > > > > > > > useful
> > > > > > > > > > > > > addition.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Overall the KIP looks good to me, I have one
> > > > > > question/comment.
> > > > > > > > > > > > >
> > > > > > > > > > > > > You mentioned that setting the `
> > > > producer.id.expiration.ms`
> > > > > > > less
> > > > > > > > > than
> > > > > > > > > > > the
> > > > > > > > > > > > > delivery timeout could lead to duplicates, which
> > makes
> > > > > sense.
> > > > > > > To
> > > > > > > > > > help
> > > > > > > > > > > > > avoid this situation, do we want to consider a
> > default
> > > > > value
> > > > > > > that
> > > > > > > > > is
> > > > > > > > > > > the
> > > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Bill
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Hey all!
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I'd like to start a discussion on my proposal to
> > > > separate
> > > > > > > > > time-based
> > > > > > > > > > > > >> producer ID expiration from transactional ID
> > > expiration
> > > > by
> > > > > > > > > > > introducing a
> > > > > > > > > > > > >> new configuration.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> The KIP Is pretty small and simple, but will be
> > > helpful
> > > > in
> > > > > > > > > > controlling
> > > > > > > > > > > > >> memory usage in brokers -- especially now that by
> > > > default
> > > > > > > > > producers
> > > > > > > > > > > are
> > > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Please take a look and leave any comments you may
> > > have!
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> KIP:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > > >> JIRA:
> > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks!
> > > > > > > > > > > > >> Justine
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
I've updated the KIP to make the new minimum value 1 and remove the -1
configuration.
I've also added the low priority to the configuration and edited the
description as Ismael mentioned.

I'm thinking about bringing this KIP to a vote soon! Let me know if there
are any other comments or questions.

Thanks,
Justine

On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson <ja...@confluent.io.invalid>
wrote:

> I agree with Ismael that we should remove -1. I think we tend to view the
> coupling of these behaviors into a single configuration as a mistake, so
> it's a little odd to keep it (even if in a weakened form).
>
> -Jason
>
> On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > I would remove -1 altogether. Two more comments:
> >
> > 1. The current description kind of leads people towards aligning the
> config
> > with delivery.timeout.ms. Is that what we want? We could say it should
> be
> > higher than delivery.timeout.ms but indicate that the default is usually
> > fine. The main reason to reduce it would be to save memory, I guess.
> > 2. Each config has a priority, we should specify it for this one. I'm
> > assuming it will be "low".
> >
> > Ismael
> >
> > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > <jo...@confluent.io.invalid>
> > wrote:
> >
> > > Hi all,
> > >
> > > I've updated the KIP to include the new default of 1 day and
> information
> > > about -1 in the description of the config.
> > > I wonder though if including -1 makes sense now that it is not the
> > default
> > > value. Is there a benefit for manually setting -1 vs manually setting
> the
> > > value that transactional.id.expiration.ms has?
> > >
> > > Let me know your thoughts.
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > +1 for having 1 day as the default and for including this change in
> the
> > > > release notes.
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > <jason@confluent.io.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > I don't think a major release is a requirement for a default change
> > for
> > > > > what it's worth. I do appreciate that there is a preference for not
> > > > rocking
> > > > > the boat though. For a little bit of background here, the problem
> we
> > > > > have encountered in production since the idempotent producer became
> > the
> > > > > default is OOM errors due to huge numbers of producerIds that had
> to
> > be
> > > > > retained in the cache for 7 days. It is hard to prevent use cases
> > from
> > > > > emerging where producers are used and discarded rapidly. We will be
> > > > using a
> > > > > lower value for sure, but it would also be nice to reduce the
> > > likelihood
> > > > > for the community to see this problem. The benefit of the caching
> > > > > diminishes quickly over time since it is primarily meant to handle
> > > > producer
> > > > > retry windows. I do not think there is much difference between 1
> days
> > > > and 7
> > > > > days from an application perspective, but it is a huge difference
> for
> > > the
> > > > > broker's memory usage.
> > > > >
> > > > > Best,
> > > > > Jason
> > > > >
> > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks Justine for the KIP. I think it might be better to
> document
> > > the
> > > > > > correlation between the new config and delivery.timeout.ms in
> the
> > > > Public
> > > > > > Interfaces Description.
> > > > > >
> > > > > > Also, I agree with Luke that for now setting a default to -1
> should
> > > be
> > > > > > good. We can look to switch to 1 day with major release.
> > > > > >
> > > > > > Thanks!
> > > > > > Sagar.
> > > > > >
> > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the KIP.
> > > > > > > I agree with you that we should try our best to keep backward
> > > > > > > compatibility, although our intention is to have lower producer
> > id
> > > > > > > expiration timeout.
> > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > Maybe we change the default to 1 day in next major release
> (4.0)?
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for taking a look Jason!
> > > > > > > >
> > > > > > > > I wondered if we wanted to have a smaller default but wasn't
> > sure
> > > > > about
> > > > > > > the
> > > > > > > > compatibility story -- especially since there is the chance
> for
> > > > > > producer
> > > > > > > > IDs to expire silently.
> > > > > > > > I do think that 1 day is fairly reasonable. If I don't hear
> any
> > > > > > > conflicting
> > > > > > > > opinions I can go ahead and update the default.
> > > > > > > >
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > configurations,
> > > I
> > > > > > think
> > > > > > > > this
> > > > > > > > > is a good change. Combining these timeout behaviors into a
> > > single
> > > > > > > > > configuration was definitely a mistake, but we didn't
> > > anticipate
> > > > > the
> > > > > > > > > problem with the producer id cache. I do wonder if we can
> > make
> > > > the
> > > > > > > > default
> > > > > > > > > a bit lower to reduce the chances that users will hit the
> > same
> > > > > memory
> > > > > > > > > issues we have seen. After decoupling this configuration
> from
> > > > > > > > > transactional.id.expiration.ms, the new timeout just needs
> > to
> > > > > cover
> > > > > > > the
> > > > > > > > > longest duration that a producer might be retrying the same
> > > > Produce
> > > > > > > > > request. 7 days seems too high. Although I think it could
> go
> > a
> > > > fair
> > > > > > > even
> > > > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jason
> > > > > > > > >
> > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Bill,
> > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > transactional.id.expiration.ms would also be over the
> > > delivery
> > > > > > > > timeout.
> > > > > > > > > :)
> > > > > > > > > > Thanks for the +1!
> > > > > > > > > >
> > > > > > > > > > Justine
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > bbejeck@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Justine,
> > > > > > > > > > >
> > > > > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > > > > question/suggestion
> > > > > > > > > > > about default values has already been addressed in the
> > > > > > > > `Compatibility`
> > > > > > > > > > > section.
> > > > > > > > > > >
> > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > >
> > > > > > > > > > > -Bill
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > bbejeck@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the well written KIP, this looks like it
> > will
> > > > be a
> > > > > > > > useful
> > > > > > > > > > > > addition.
> > > > > > > > > > > >
> > > > > > > > > > > > Overall the KIP looks good to me, I have one
> > > > > question/comment.
> > > > > > > > > > > >
> > > > > > > > > > > > You mentioned that setting the `
> > > producer.id.expiration.ms`
> > > > > > less
> > > > > > > > than
> > > > > > > > > > the
> > > > > > > > > > > > delivery timeout could lead to duplicates, which
> makes
> > > > sense.
> > > > > > To
> > > > > > > > > help
> > > > > > > > > > > > avoid this situation, do we want to consider a
> default
> > > > value
> > > > > > that
> > > > > > > > is
> > > > > > > > > > the
> > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > Bill
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Hey all!
> > > > > > > > > > > >>
> > > > > > > > > > > >> I'd like to start a discussion on my proposal to
> > > separate
> > > > > > > > time-based
> > > > > > > > > > > >> producer ID expiration from transactional ID
> > expiration
> > > by
> > > > > > > > > > introducing a
> > > > > > > > > > > >> new configuration.
> > > > > > > > > > > >>
> > > > > > > > > > > >> The KIP Is pretty small and simple, but will be
> > helpful
> > > in
> > > > > > > > > controlling
> > > > > > > > > > > >> memory usage in brokers -- especially now that by
> > > default
> > > > > > > > producers
> > > > > > > > > > are
> > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Please take a look and leave any comments you may
> > have!
> > > > > > > > > > > >>
> > > > > > > > > > > >> KIP:
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > >> JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thanks!
> > > > > > > > > > > >> Justine
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
I agree with Ismael that we should remove -1. I think we tend to view the
coupling of these behaviors into a single configuration as a mistake, so
it's a little odd to keep it (even if in a weakened form).

-Jason

On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <is...@juma.me.uk> wrote:

> I would remove -1 altogether. Two more comments:
>
> 1. The current description kind of leads people towards aligning the config
> with delivery.timeout.ms. Is that what we want? We could say it should be
> higher than delivery.timeout.ms but indicate that the default is usually
> fine. The main reason to reduce it would be to save memory, I guess.
> 2. Each config has a priority, we should specify it for this one. I'm
> assuming it will be "low".
>
> Ismael
>
> On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> <jo...@confluent.io.invalid>
> wrote:
>
> > Hi all,
> >
> > I've updated the KIP to include the new default of 1 day and information
> > about -1 in the description of the config.
> > I wonder though if including -1 makes sense now that it is not the
> default
> > value. Is there a benefit for manually setting -1 vs manually setting the
> > value that transactional.id.expiration.ms has?
> >
> > Let me know your thoughts.
> > Thanks,
> > Justine
> >
> > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > +1 for having 1 day as the default and for including this change in the
> > > release notes.
> > >
> > > Ismael
> > >
> > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > <jason@confluent.io.invalid
> > > >
> > > wrote:
> > >
> > > > I don't think a major release is a requirement for a default change
> for
> > > > what it's worth. I do appreciate that there is a preference for not
> > > rocking
> > > > the boat though. For a little bit of background here, the problem we
> > > > have encountered in production since the idempotent producer became
> the
> > > > default is OOM errors due to huge numbers of producerIds that had to
> be
> > > > retained in the cache for 7 days. It is hard to prevent use cases
> from
> > > > emerging where producers are used and discarded rapidly. We will be
> > > using a
> > > > lower value for sure, but it would also be nice to reduce the
> > likelihood
> > > > for the community to see this problem. The benefit of the caching
> > > > diminishes quickly over time since it is primarily meant to handle
> > > producer
> > > > retry windows. I do not think there is much difference between 1 days
> > > and 7
> > > > days from an application perspective, but it is a huge difference for
> > the
> > > > broker's memory usage.
> > > >
> > > > Best,
> > > > Jason
> > > >
> > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks Justine for the KIP. I think it might be better to document
> > the
> > > > > correlation between the new config and delivery.timeout.ms in the
> > > Public
> > > > > Interfaces Description.
> > > > >
> > > > > Also, I agree with Luke that for now setting a default to -1 should
> > be
> > > > > good. We can look to switch to 1 day with major release.
> > > > >
> > > > > Thanks!
> > > > > Sagar.
> > > > >
> > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for the KIP.
> > > > > > I agree with you that we should try our best to keep backward
> > > > > > compatibility, although our intention is to have lower producer
> id
> > > > > > expiration timeout.
> > > > > > So, I think we should keep default to -1 IMO.
> > > > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > <jo...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for taking a look Jason!
> > > > > > >
> > > > > > > I wondered if we wanted to have a smaller default but wasn't
> sure
> > > > about
> > > > > > the
> > > > > > > compatibility story -- especially since there is the chance for
> > > > > producer
> > > > > > > IDs to expire silently.
> > > > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > > > conflicting
> > > > > > > opinions I can go ahead and update the default.
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > <ja...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Although I hate seeing new
> configurations,
> > I
> > > > > think
> > > > > > > this
> > > > > > > > is a good change. Combining these timeout behaviors into a
> > single
> > > > > > > > configuration was definitely a mistake, but we didn't
> > anticipate
> > > > the
> > > > > > > > problem with the producer id cache. I do wonder if we can
> make
> > > the
> > > > > > > default
> > > > > > > > a bit lower to reduce the chances that users will hit the
> same
> > > > memory
> > > > > > > > issues we have seen. After decoupling this configuration from
> > > > > > > > transactional.id.expiration.ms, the new timeout just needs
> to
> > > > cover
> > > > > > the
> > > > > > > > longest duration that a producer might be retrying the same
> > > Produce
> > > > > > > > request. 7 days seems too high. Although I think it could go
> a
> > > fair
> > > > > > even
> > > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jason
> > > > > > > >
> > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > <jo...@confluent.io.invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Bill,
> > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > transactional.id.expiration.ms would also be over the
> > delivery
> > > > > > > timeout.
> > > > > > > > :)
> > > > > > > > > Thanks for the +1!
> > > > > > > > >
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Justine,
> > > > > > > > > >
> > > > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > > > question/suggestion
> > > > > > > > > > about default values has already been addressed in the
> > > > > > > `Compatibility`
> > > > > > > > > > section.
> > > > > > > > > >
> > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > >
> > > > > > > > > > -Bill
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > bbejeck@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Justine,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the well written KIP, this looks like it
> will
> > > be a
> > > > > > > useful
> > > > > > > > > > > addition.
> > > > > > > > > > >
> > > > > > > > > > > Overall the KIP looks good to me, I have one
> > > > question/comment.
> > > > > > > > > > >
> > > > > > > > > > > You mentioned that setting the `
> > producer.id.expiration.ms`
> > > > > less
> > > > > > > than
> > > > > > > > > the
> > > > > > > > > > > delivery timeout could lead to duplicates, which makes
> > > sense.
> > > > > To
> > > > > > > > help
> > > > > > > > > > > avoid this situation, do we want to consider a default
> > > value
> > > > > that
> > > > > > > is
> > > > > > > > > the
> > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > >
> > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > >
> > > > > > > > > > > Bill
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hey all!
> > > > > > > > > > >>
> > > > > > > > > > >> I'd like to start a discussion on my proposal to
> > separate
> > > > > > > time-based
> > > > > > > > > > >> producer ID expiration from transactional ID
> expiration
> > by
> > > > > > > > > introducing a
> > > > > > > > > > >> new configuration.
> > > > > > > > > > >>
> > > > > > > > > > >> The KIP Is pretty small and simple, but will be
> helpful
> > in
> > > > > > > > controlling
> > > > > > > > > > >> memory usage in brokers -- especially now that by
> > default
> > > > > > > producers
> > > > > > > > > are
> > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > >>
> > > > > > > > > > >> Please take a look and leave any comments you may
> have!
> > > > > > > > > > >>
> > > > > > > > > > >> KIP:
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > >> JIRA:
> https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks!
> > > > > > > > > > >> Justine
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Ismael Juma <is...@juma.me.uk>.
I would remove -1 altogether. Two more comments:

1. The current description kind of leads people towards aligning the config
with delivery.timeout.ms. Is that what we want? We could say it should be
higher than delivery.timeout.ms but indicate that the default is usually
fine. The main reason to reduce it would be to save memory, I guess.
2. Each config has a priority, we should specify it for this one. I'm
assuming it will be "low".

Ismael

On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hi all,
>
> I've updated the KIP to include the new default of 1 day and information
> about -1 in the description of the config.
> I wonder though if including -1 makes sense now that it is not the default
> value. Is there a benefit for manually setting -1 vs manually setting the
> value that transactional.id.expiration.ms has?
>
> Let me know your thoughts.
> Thanks,
> Justine
>
> On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > +1 for having 1 day as the default and for including this change in the
> > release notes.
> >
> > Ismael
> >
> > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> <jason@confluent.io.invalid
> > >
> > wrote:
> >
> > > I don't think a major release is a requirement for a default change for
> > > what it's worth. I do appreciate that there is a preference for not
> > rocking
> > > the boat though. For a little bit of background here, the problem we
> > > have encountered in production since the idempotent producer became the
> > > default is OOM errors due to huge numbers of producerIds that had to be
> > > retained in the cache for 7 days. It is hard to prevent use cases from
> > > emerging where producers are used and discarded rapidly. We will be
> > using a
> > > lower value for sure, but it would also be nice to reduce the
> likelihood
> > > for the community to see this problem. The benefit of the caching
> > > diminishes quickly over time since it is primarily meant to handle
> > producer
> > > retry windows. I do not think there is much difference between 1 days
> > and 7
> > > days from an application perspective, but it is a huge difference for
> the
> > > broker's memory usage.
> > >
> > > Best,
> > > Jason
> > >
> > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com>
> wrote:
> > >
> > > > Thanks Justine for the KIP. I think it might be better to document
> the
> > > > correlation between the new config and delivery.timeout.ms in the
> > Public
> > > > Interfaces Description.
> > > >
> > > > Also, I agree with Luke that for now setting a default to -1 should
> be
> > > > good. We can look to switch to 1 day with major release.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP.
> > > > > I agree with you that we should try our best to keep backward
> > > > > compatibility, although our intention is to have lower producer id
> > > > > expiration timeout.
> > > > > So, I think we should keep default to -1 IMO.
> > > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > <jo...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Thanks for taking a look Jason!
> > > > > >
> > > > > > I wondered if we wanted to have a smaller default but wasn't sure
> > > about
> > > > > the
> > > > > > compatibility story -- especially since there is the chance for
> > > > producer
> > > > > > IDs to expire silently.
> > > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > > conflicting
> > > > > > opinions I can go ahead and update the default.
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > <ja...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the KIP. Although I hate seeing new configurations,
> I
> > > > think
> > > > > > this
> > > > > > > is a good change. Combining these timeout behaviors into a
> single
> > > > > > > configuration was definitely a mistake, but we didn't
> anticipate
> > > the
> > > > > > > problem with the producer id cache. I do wonder if we can make
> > the
> > > > > > default
> > > > > > > a bit lower to reduce the chances that users will hit the same
> > > memory
> > > > > > > issues we have seen. After decoupling this configuration from
> > > > > > > transactional.id.expiration.ms, the new timeout just needs to
> > > cover
> > > > > the
> > > > > > > longest duration that a producer might be retrying the same
> > Produce
> > > > > > > request. 7 days seems too high. Although I think it could go a
> > fair
> > > > > even
> > > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Bill,
> > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > transactional.id.expiration.ms would also be over the
> delivery
> > > > > > timeout.
> > > > > > > :)
> > > > > > > > Thanks for the +1!
> > > > > > > >
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> bbejeck@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > > question/suggestion
> > > > > > > > > about default values has already been addressed in the
> > > > > > `Compatibility`
> > > > > > > > > section.
> > > > > > > > >
> > > > > > > > > I'm +1 on the KIP.
> > > > > > > > >
> > > > > > > > > -Bill
> > > > > > > > >
> > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Justine,
> > > > > > > > > >
> > > > > > > > > > Thanks for the well written KIP, this looks like it will
> > be a
> > > > > > useful
> > > > > > > > > > addition.
> > > > > > > > > >
> > > > > > > > > > Overall the KIP looks good to me, I have one
> > > question/comment.
> > > > > > > > > >
> > > > > > > > > > You mentioned that setting the `
> producer.id.expiration.ms`
> > > > less
> > > > > > than
> > > > > > > > the
> > > > > > > > > > delivery timeout could lead to duplicates, which makes
> > sense.
> > > > To
> > > > > > > help
> > > > > > > > > > avoid this situation, do we want to consider a default
> > value
> > > > that
> > > > > > is
> > > > > > > > the
> > > > > > > > > > same as the delivery timeout?
> > > > > > > > > >
> > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > >
> > > > > > > > > > Bill
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey all!
> > > > > > > > > >>
> > > > > > > > > >> I'd like to start a discussion on my proposal to
> separate
> > > > > > time-based
> > > > > > > > > >> producer ID expiration from transactional ID expiration
> by
> > > > > > > > introducing a
> > > > > > > > > >> new configuration.
> > > > > > > > > >>
> > > > > > > > > >> The KIP Is pretty small and simple, but will be helpful
> in
> > > > > > > controlling
> > > > > > > > > >> memory usage in brokers -- especially now that by
> default
> > > > > > producers
> > > > > > > > are
> > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > >>
> > > > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > > > >>
> > > > > > > > > >> KIP:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > >>
> > > > > > > > > >> Thanks!
> > > > > > > > > >> Justine
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi all,

I've updated the KIP to include the new default of 1 day and information
about -1 in the description of the config.
I wonder though if including -1 makes sense now that it is not the default
value. Is there a benefit for manually setting -1 vs manually setting the
value that transactional.id.expiration.ms has?

Let me know your thoughts.
Thanks,
Justine

On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <is...@juma.me.uk> wrote:

> +1 for having 1 day as the default and for including this change in the
> release notes.
>
> Ismael
>
> On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson <jason@confluent.io.invalid
> >
> wrote:
>
> > I don't think a major release is a requirement for a default change for
> > what it's worth. I do appreciate that there is a preference for not
> rocking
> > the boat though. For a little bit of background here, the problem we
> > have encountered in production since the idempotent producer became the
> > default is OOM errors due to huge numbers of producerIds that had to be
> > retained in the cache for 7 days. It is hard to prevent use cases from
> > emerging where producers are used and discarded rapidly. We will be
> using a
> > lower value for sure, but it would also be nice to reduce the likelihood
> > for the community to see this problem. The benefit of the caching
> > diminishes quickly over time since it is primarily meant to handle
> producer
> > retry windows. I do not think there is much difference between 1 days
> and 7
> > days from an application perspective, but it is a huge difference for the
> > broker's memory usage.
> >
> > Best,
> > Jason
> >
> > On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com> wrote:
> >
> > > Thanks Justine for the KIP. I think it might be better to document the
> > > correlation between the new config and delivery.timeout.ms in the
> Public
> > > Interfaces Description.
> > >
> > > Also, I agree with Luke that for now setting a default to -1 should be
> > > good. We can look to switch to 1 day with major release.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP.
> > > > I agree with you that we should try our best to keep backward
> > > > compatibility, although our intention is to have lower producer id
> > > > expiration timeout.
> > > > So, I think we should keep default to -1 IMO.
> > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > <jo...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Thanks for taking a look Jason!
> > > > >
> > > > > I wondered if we wanted to have a smaller default but wasn't sure
> > about
> > > > the
> > > > > compatibility story -- especially since there is the chance for
> > > producer
> > > > > IDs to expire silently.
> > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > conflicting
> > > > > opinions I can go ahead and update the default.
> > > > >
> > > > > Justine
> > > > >
> > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > <ja...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for the KIP. Although I hate seeing new configurations, I
> > > think
> > > > > this
> > > > > > is a good change. Combining these timeout behaviors into a single
> > > > > > configuration was definitely a mistake, but we didn't anticipate
> > the
> > > > > > problem with the producer id cache. I do wonder if we can make
> the
> > > > > default
> > > > > > a bit lower to reduce the chances that users will hit the same
> > memory
> > > > > > issues we have seen. After decoupling this configuration from
> > > > > > transactional.id.expiration.ms, the new timeout just needs to
> > cover
> > > > the
> > > > > > longest duration that a producer might be retrying the same
> Produce
> > > > > > request. 7 days seems too high. Although I think it could go a
> fair
> > > > even
> > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > <jo...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Bill,
> > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > transactional.id.expiration.ms would also be over the delivery
> > > > > timeout.
> > > > > > :)
> > > > > > > Thanks for the +1!
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bbejeck@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > question/suggestion
> > > > > > > > about default values has already been addressed in the
> > > > > `Compatibility`
> > > > > > > > section.
> > > > > > > >
> > > > > > > > I'm +1 on the KIP.
> > > > > > > >
> > > > > > > > -Bill
> > > > > > > >
> > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> bbejeck@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > Thanks for the well written KIP, this looks like it will
> be a
> > > > > useful
> > > > > > > > > addition.
> > > > > > > > >
> > > > > > > > > Overall the KIP looks good to me, I have one
> > question/comment.
> > > > > > > > >
> > > > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> > > less
> > > > > than
> > > > > > > the
> > > > > > > > > delivery timeout could lead to duplicates, which makes
> sense.
> > > To
> > > > > > help
> > > > > > > > > avoid this situation, do we want to consider a default
> value
> > > that
> > > > > is
> > > > > > > the
> > > > > > > > > same as the delivery timeout?
> > > > > > > > >
> > > > > > > > > Thanks again for the KIP.
> > > > > > > > >
> > > > > > > > > Bill
> > > > > > > > >
> > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > > >
> > > > > > > > >> Hey all!
> > > > > > > > >>
> > > > > > > > >> I'd like to start a discussion on my proposal to separate
> > > > > time-based
> > > > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > > > introducing a
> > > > > > > > >> new configuration.
> > > > > > > > >>
> > > > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > > > controlling
> > > > > > > > >> memory usage in brokers -- especially now that by default
> > > > > producers
> > > > > > > are
> > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > >>
> > > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > > >>
> > > > > > > > >> KIP:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > >>
> > > > > > > > >> Thanks!
> > > > > > > > >> Justine
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Ismael Juma <is...@juma.me.uk>.
+1 for having 1 day as the default and for including this change in the
release notes.

Ismael

On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson <ja...@confluent.io.invalid>
wrote:

> I don't think a major release is a requirement for a default change for
> what it's worth. I do appreciate that there is a preference for not rocking
> the boat though. For a little bit of background here, the problem we
> have encountered in production since the idempotent producer became the
> default is OOM errors due to huge numbers of producerIds that had to be
> retained in the cache for 7 days. It is hard to prevent use cases from
> emerging where producers are used and discarded rapidly. We will be using a
> lower value for sure, but it would also be nice to reduce the likelihood
> for the community to see this problem. The benefit of the caching
> diminishes quickly over time since it is primarily meant to handle producer
> retry windows. I do not think there is much difference between 1 days and 7
> days from an application perspective, but it is a huge difference for the
> broker's memory usage.
>
> Best,
> Jason
>
> On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com> wrote:
>
> > Thanks Justine for the KIP. I think it might be better to document the
> > correlation between the new config and delivery.timeout.ms in the Public
> > Interfaces Description.
> >
> > Also, I agree with Luke that for now setting a default to -1 should be
> > good. We can look to switch to 1 day with major release.
> >
> > Thanks!
> > Sagar.
> >
> > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.
> > > I agree with you that we should try our best to keep backward
> > > compatibility, although our intention is to have lower producer id
> > > expiration timeout.
> > > So, I think we should keep default to -1 IMO.
> > > Maybe we change the default to 1 day in next major release (4.0)?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > <jo...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Thanks for taking a look Jason!
> > > >
> > > > I wondered if we wanted to have a smaller default but wasn't sure
> about
> > > the
> > > > compatibility story -- especially since there is the chance for
> > producer
> > > > IDs to expire silently.
> > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > conflicting
> > > > opinions I can go ahead and update the default.
> > > >
> > > > Justine
> > > >
> > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > <ja...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP. Although I hate seeing new configurations, I
> > think
> > > > this
> > > > > is a good change. Combining these timeout behaviors into a single
> > > > > configuration was definitely a mistake, but we didn't anticipate
> the
> > > > > problem with the producer id cache. I do wonder if we can make the
> > > > default
> > > > > a bit lower to reduce the chances that users will hit the same
> memory
> > > > > issues we have seen. After decoupling this configuration from
> > > > > transactional.id.expiration.ms, the new timeout just needs to
> cover
> > > the
> > > > > longest duration that a producer might be retrying the same Produce
> > > > > request. 7 days seems too high. Although I think it could go a fair
> > > even
> > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > <jo...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > > > Hey Bill,
> > > > > > Thanks! I was just going to say that hopefully
> > > > > > transactional.id.expiration.ms would also be over the delivery
> > > > timeout.
> > > > > :)
> > > > > > Thanks for the +1!
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > I just took another look at the KIP, and I realize my
> > > > > question/suggestion
> > > > > > > about default values has already been addressed in the
> > > > `Compatibility`
> > > > > > > section.
> > > > > > >
> > > > > > > I'm +1 on the KIP.
> > > > > > >
> > > > > > > -Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bbejeck@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > > useful
> > > > > > > > addition.
> > > > > > > >
> > > > > > > > Overall the KIP looks good to me, I have one
> question/comment.
> > > > > > > >
> > > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> > less
> > > > than
> > > > > > the
> > > > > > > > delivery timeout could lead to duplicates, which makes sense.
> > To
> > > > > help
> > > > > > > > avoid this situation, do we want to consider a default value
> > that
> > > > is
> > > > > > the
> > > > > > > > same as the delivery timeout?
> > > > > > > >
> > > > > > > > Thanks again for the KIP.
> > > > > > > >
> > > > > > > > Bill
> > > > > > > >
> > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > > >
> > > > > > > >> Hey all!
> > > > > > > >>
> > > > > > > >> I'd like to start a discussion on my proposal to separate
> > > > time-based
> > > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > > introducing a
> > > > > > > >> new configuration.
> > > > > > > >>
> > > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > > controlling
> > > > > > > >> memory usage in brokers -- especially now that by default
> > > > producers
> > > > > > are
> > > > > > > >> idempotent and create producer ID state.
> > > > > > > >>
> > > > > > > >> Please take a look and leave any comments you may have!
> > > > > > > >>
> > > > > > > >> KIP:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >> Justine
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
I don't think a major release is a requirement for a default change for
what it's worth. I do appreciate that there is a preference for not rocking
the boat though. For a little bit of background here, the problem we
have encountered in production since the idempotent producer became the
default is OOM errors due to huge numbers of producerIds that had to be
retained in the cache for 7 days. It is hard to prevent use cases from
emerging where producers are used and discarded rapidly. We will be using a
lower value for sure, but it would also be nice to reduce the likelihood
for the community to see this problem. The benefit of the caching
diminishes quickly over time since it is primarily meant to handle producer
retry windows. I do not think there is much difference between 1 days and 7
days from an application perspective, but it is a huge difference for the
broker's memory usage.

Best,
Jason

On Wed, Jul 27, 2022 at 2:57 AM Sagar <sa...@gmail.com> wrote:

> Thanks Justine for the KIP. I think it might be better to document the
> correlation between the new config and delivery.timeout.ms in the Public
> Interfaces Description.
>
> Also, I agree with Luke that for now setting a default to -1 should be
> good. We can look to switch to 1 day with major release.
>
> Thanks!
> Sagar.
>
> On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP.
> > I agree with you that we should try our best to keep backward
> > compatibility, although our intention is to have lower producer id
> > expiration timeout.
> > So, I think we should keep default to -1 IMO.
> > Maybe we change the default to 1 day in next major release (4.0)?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > <jo...@confluent.io.invalid>
> > wrote:
> >
> > > Thanks for taking a look Jason!
> > >
> > > I wondered if we wanted to have a smaller default but wasn't sure about
> > the
> > > compatibility story -- especially since there is the chance for
> producer
> > > IDs to expire silently.
> > > I do think that 1 day is fairly reasonable. If I don't hear any
> > conflicting
> > > opinions I can go ahead and update the default.
> > >
> > > Justine
> > >
> > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > <ja...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP. Although I hate seeing new configurations, I
> think
> > > this
> > > > is a good change. Combining these timeout behaviors into a single
> > > > configuration was definitely a mistake, but we didn't anticipate the
> > > > problem with the producer id cache. I do wonder if we can make the
> > > default
> > > > a bit lower to reduce the chances that users will hit the same memory
> > > > issues we have seen. After decoupling this configuration from
> > > > transactional.id.expiration.ms, the new timeout just needs to cover
> > the
> > > > longest duration that a producer might be retrying the same Produce
> > > > request. 7 days seems too high. Although I think it could go a fair
> > even
> > > > lower, perhaps 1 day is a reasonable place to start?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > <jo...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hey Bill,
> > > > > Thanks! I was just going to say that hopefully
> > > > > transactional.id.expiration.ms would also be over the delivery
> > > timeout.
> > > > :)
> > > > > Thanks for the +1!
> > > > >
> > > > > Justine
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > I just took another look at the KIP, and I realize my
> > > > question/suggestion
> > > > > > about default values has already been addressed in the
> > > `Compatibility`
> > > > > > section.
> > > > > >
> > > > > > I'm +1 on the KIP.
> > > > > >
> > > > > > -Bill
> > > > > >
> > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > useful
> > > > > > > addition.
> > > > > > >
> > > > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > > > >
> > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> less
> > > than
> > > > > the
> > > > > > > delivery timeout could lead to duplicates, which makes sense.
> To
> > > > help
> > > > > > > avoid this situation, do we want to consider a default value
> that
> > > is
> > > > > the
> > > > > > > same as the delivery timeout?
> > > > > > >
> > > > > > > Thanks again for the KIP.
> > > > > > >
> > > > > > > Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > > >
> > > > > > >> Hey all!
> > > > > > >>
> > > > > > >> I'd like to start a discussion on my proposal to separate
> > > time-based
> > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > introducing a
> > > > > > >> new configuration.
> > > > > > >>
> > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > controlling
> > > > > > >> memory usage in brokers -- especially now that by default
> > > producers
> > > > > are
> > > > > > >> idempotent and create producer ID state.
> > > > > > >>
> > > > > > >> Please take a look and leave any comments you may have!
> > > > > > >>
> > > > > > >> KIP:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >> Justine
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Sagar <sa...@gmail.com>.
Thanks Justine for the KIP. I think it might be better to document the
correlation between the new config and delivery.timeout.ms in the Public
Interfaces Description.

Also, I agree with Luke that for now setting a default to -1 should be
good. We can look to switch to 1 day with major release.

Thanks!
Sagar.

On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <sh...@gmail.com> wrote:

> Hi Justine,
>
> Thanks for the KIP.
> I agree with you that we should try our best to keep backward
> compatibility, although our intention is to have lower producer id
> expiration timeout.
> So, I think we should keep default to -1 IMO.
> Maybe we change the default to 1 day in next major release (4.0)?
>
> Thank you.
> Luke
>
> On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> <jo...@confluent.io.invalid>
> wrote:
>
> > Thanks for taking a look Jason!
> >
> > I wondered if we wanted to have a smaller default but wasn't sure about
> the
> > compatibility story -- especially since there is the chance for producer
> > IDs to expire silently.
> > I do think that 1 day is fairly reasonable. If I don't hear any
> conflicting
> > opinions I can go ahead and update the default.
> >
> > Justine
> >
> > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > <ja...@confluent.io.invalid>
> > wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP. Although I hate seeing new configurations, I think
> > this
> > > is a good change. Combining these timeout behaviors into a single
> > > configuration was definitely a mistake, but we didn't anticipate the
> > > problem with the producer id cache. I do wonder if we can make the
> > default
> > > a bit lower to reduce the chances that users will hit the same memory
> > > issues we have seen. After decoupling this configuration from
> > > transactional.id.expiration.ms, the new timeout just needs to cover
> the
> > > longest duration that a producer might be retrying the same Produce
> > > request. 7 days seems too high. Although I think it could go a fair
> even
> > > lower, perhaps 1 day is a reasonable place to start?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > <jo...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Hey Bill,
> > > > Thanks! I was just going to say that hopefully
> > > > transactional.id.expiration.ms would also be over the delivery
> > timeout.
> > > :)
> > > > Thanks for the +1!
> > > >
> > > > Justine
> > > >
> > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com>
> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > I just took another look at the KIP, and I realize my
> > > question/suggestion
> > > > > about default values has already been addressed in the
> > `Compatibility`
> > > > > section.
> > > > >
> > > > > I'm +1 on the KIP.
> > > > >
> > > > > -Bill
> > > > >
> > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for the well written KIP, this looks like it will be a
> > useful
> > > > > > addition.
> > > > > >
> > > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > > >
> > > > > > You mentioned that setting the `producer.id.expiration.ms` less
> > than
> > > > the
> > > > > > delivery timeout could lead to duplicates, which makes sense.  To
> > > help
> > > > > > avoid this situation, do we want to consider a default value that
> > is
> > > > the
> > > > > > same as the delivery timeout?
> > > > > >
> > > > > > Thanks again for the KIP.
> > > > > >
> > > > > > Bill
> > > > > >
> > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > <jo...@confluent.io.invalid> wrote:
> > > > > >
> > > > > >> Hey all!
> > > > > >>
> > > > > >> I'd like to start a discussion on my proposal to separate
> > time-based
> > > > > >> producer ID expiration from transactional ID expiration by
> > > > introducing a
> > > > > >> new configuration.
> > > > > >>
> > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > controlling
> > > > > >> memory usage in brokers -- especially now that by default
> > producers
> > > > are
> > > > > >> idempotent and create producer ID state.
> > > > > >>
> > > > > >> Please take a look and leave any comments you may have!
> > > > > >>
> > > > > >> KIP:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > >>
> > > > > >> Thanks!
> > > > > >> Justine
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Luke Chen <sh...@gmail.com>.
Hi Justine,

Thanks for the KIP.
I agree with you that we should try our best to keep backward
compatibility, although our intention is to have lower producer id
expiration timeout.
So, I think we should keep default to -1 IMO.
Maybe we change the default to 1 day in next major release (4.0)?

Thank you.
Luke

On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Thanks for taking a look Jason!
>
> I wondered if we wanted to have a smaller default but wasn't sure about the
> compatibility story -- especially since there is the chance for producer
> IDs to expire silently.
> I do think that 1 day is fairly reasonable. If I don't hear any conflicting
> opinions I can go ahead and update the default.
>
> Justine
>
> On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> <ja...@confluent.io.invalid>
> wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP. Although I hate seeing new configurations, I think
> this
> > is a good change. Combining these timeout behaviors into a single
> > configuration was definitely a mistake, but we didn't anticipate the
> > problem with the producer id cache. I do wonder if we can make the
> default
> > a bit lower to reduce the chances that users will hit the same memory
> > issues we have seen. After decoupling this configuration from
> > transactional.id.expiration.ms, the new timeout just needs to cover the
> > longest duration that a producer might be retrying the same Produce
> > request. 7 days seems too high. Although I think it could go a fair even
> > lower, perhaps 1 day is a reasonable place to start?
> >
> > Thanks,
> > Jason
> >
> > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > <jo...@confluent.io.invalid>
> > wrote:
> >
> > > Hey Bill,
> > > Thanks! I was just going to say that hopefully
> > > transactional.id.expiration.ms would also be over the delivery
> timeout.
> > :)
> > > Thanks for the +1!
> > >
> > > Justine
> > >
> > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > I just took another look at the KIP, and I realize my
> > question/suggestion
> > > > about default values has already been addressed in the
> `Compatibility`
> > > > section.
> > > >
> > > > I'm +1 on the KIP.
> > > >
> > > > -Bill
> > > >
> > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com>
> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the well written KIP, this looks like it will be a
> useful
> > > > > addition.
> > > > >
> > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > >
> > > > > You mentioned that setting the `producer.id.expiration.ms` less
> than
> > > the
> > > > > delivery timeout could lead to duplicates, which makes sense.  To
> > help
> > > > > avoid this situation, do we want to consider a default value that
> is
> > > the
> > > > > same as the delivery timeout?
> > > > >
> > > > > Thanks again for the KIP.
> > > > >
> > > > > Bill
> > > > >
> > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > <jo...@confluent.io.invalid> wrote:
> > > > >
> > > > >> Hey all!
> > > > >>
> > > > >> I'd like to start a discussion on my proposal to separate
> time-based
> > > > >> producer ID expiration from transactional ID expiration by
> > > introducing a
> > > > >> new configuration.
> > > > >>
> > > > >> The KIP Is pretty small and simple, but will be helpful in
> > controlling
> > > > >> memory usage in brokers -- especially now that by default
> producers
> > > are
> > > > >> idempotent and create producer ID state.
> > > > >>
> > > > >> Please take a look and leave any comments you may have!
> > > > >>
> > > > >> KIP:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > >>
> > > > >> Thanks!
> > > > >> Justine
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Thanks for taking a look Jason!

I wondered if we wanted to have a smaller default but wasn't sure about the
compatibility story -- especially since there is the chance for producer
IDs to expire silently.
I do think that 1 day is fairly reasonable. If I don't hear any conflicting
opinions I can go ahead and update the default.

Justine

On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson <ja...@confluent.io.invalid>
wrote:

> Hi Justine,
>
> Thanks for the KIP. Although I hate seeing new configurations, I think this
> is a good change. Combining these timeout behaviors into a single
> configuration was definitely a mistake, but we didn't anticipate the
> problem with the producer id cache. I do wonder if we can make the default
> a bit lower to reduce the chances that users will hit the same memory
> issues we have seen. After decoupling this configuration from
> transactional.id.expiration.ms, the new timeout just needs to cover the
> longest duration that a producer might be retrying the same Produce
> request. 7 days seems too high. Although I think it could go a fair even
> lower, perhaps 1 day is a reasonable place to start?
>
> Thanks,
> Jason
>
> On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> <jo...@confluent.io.invalid>
> wrote:
>
> > Hey Bill,
> > Thanks! I was just going to say that hopefully
> > transactional.id.expiration.ms would also be over the delivery timeout.
> :)
> > Thanks for the +1!
> >
> > Justine
> >
> > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com> wrote:
> >
> > > Hi Justine,
> > >
> > > I just took another look at the KIP, and I realize my
> question/suggestion
> > > about default values has already been addressed in the `Compatibility`
> > > section.
> > >
> > > I'm +1 on the KIP.
> > >
> > > -Bill
> > >
> > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the well written KIP, this looks like it will be a useful
> > > > addition.
> > > >
> > > > Overall the KIP looks good to me, I have one question/comment.
> > > >
> > > > You mentioned that setting the `producer.id.expiration.ms` less than
> > the
> > > > delivery timeout could lead to duplicates, which makes sense.  To
> help
> > > > avoid this situation, do we want to consider a default value that is
> > the
> > > > same as the delivery timeout?
> > > >
> > > > Thanks again for the KIP.
> > > >
> > > > Bill
> > > >
> > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > <jo...@confluent.io.invalid> wrote:
> > > >
> > > >> Hey all!
> > > >>
> > > >> I'd like to start a discussion on my proposal to separate time-based
> > > >> producer ID expiration from transactional ID expiration by
> > introducing a
> > > >> new configuration.
> > > >>
> > > >> The KIP Is pretty small and simple, but will be helpful in
> controlling
> > > >> memory usage in brokers -- especially now that by default producers
> > are
> > > >> idempotent and create producer ID state.
> > > >>
> > > >> Please take a look and leave any comments you may have!
> > > >>
> > > >> KIP:
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > >>
> > > >> Thanks!
> > > >> Justine
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.
Hi Justine,

Thanks for the KIP. Although I hate seeing new configurations, I think this
is a good change. Combining these timeout behaviors into a single
configuration was definitely a mistake, but we didn't anticipate the
problem with the producer id cache. I do wonder if we can make the default
a bit lower to reduce the chances that users will hit the same memory
issues we have seen. After decoupling this configuration from
transactional.id.expiration.ms, the new timeout just needs to cover the
longest duration that a producer might be retrying the same Produce
request. 7 days seems too high. Although I think it could go a fair even
lower, perhaps 1 day is a reasonable place to start?

Thanks,
Jason

On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hey Bill,
> Thanks! I was just going to say that hopefully
> transactional.id.expiration.ms would also be over the delivery timeout. :)
> Thanks for the +1!
>
> Justine
>
> On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com> wrote:
>
> > Hi Justine,
> >
> > I just took another look at the KIP, and I realize my question/suggestion
> > about default values has already been addressed in the `Compatibility`
> > section.
> >
> > I'm +1 on the KIP.
> >
> > -Bill
> >
> > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the well written KIP, this looks like it will be a useful
> > > addition.
> > >
> > > Overall the KIP looks good to me, I have one question/comment.
> > >
> > > You mentioned that setting the `producer.id.expiration.ms` less than
> the
> > > delivery timeout could lead to duplicates, which makes sense.  To help
> > > avoid this situation, do we want to consider a default value that is
> the
> > > same as the delivery timeout?
> > >
> > > Thanks again for the KIP.
> > >
> > > Bill
> > >
> > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > <jo...@confluent.io.invalid> wrote:
> > >
> > >> Hey all!
> > >>
> > >> I'd like to start a discussion on my proposal to separate time-based
> > >> producer ID expiration from transactional ID expiration by
> introducing a
> > >> new configuration.
> > >>
> > >> The KIP Is pretty small and simple, but will be helpful in controlling
> > >> memory usage in brokers -- especially now that by default producers
> are
> > >> idempotent and create producer ID state.
> > >>
> > >> Please take a look and leave any comments you may have!
> > >>
> > >> KIP:
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > >>
> > >> Thanks!
> > >> Justine
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hey Bill,
Thanks! I was just going to say that hopefully
transactional.id.expiration.ms would also be over the delivery timeout. :)
Thanks for the +1!

Justine

On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <bb...@gmail.com> wrote:

> Hi Justine,
>
> I just took another look at the KIP, and I realize my question/suggestion
> about default values has already been addressed in the `Compatibility`
> section.
>
> I'm +1 on the KIP.
>
> -Bill
>
> On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com> wrote:
>
> > Hi Justine,
> >
> > Thanks for the well written KIP, this looks like it will be a useful
> > addition.
> >
> > Overall the KIP looks good to me, I have one question/comment.
> >
> > You mentioned that setting the `producer.id.expiration.ms` less than the
> > delivery timeout could lead to duplicates, which makes sense.  To help
> > avoid this situation, do we want to consider a default value that is the
> > same as the delivery timeout?
> >
> > Thanks again for the KIP.
> >
> > Bill
> >
> > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > <jo...@confluent.io.invalid> wrote:
> >
> >> Hey all!
> >>
> >> I'd like to start a discussion on my proposal to separate time-based
> >> producer ID expiration from transactional ID expiration by introducing a
> >> new configuration.
> >>
> >> The KIP Is pretty small and simple, but will be helpful in controlling
> >> memory usage in brokers -- especially now that by default producers are
> >> idempotent and create producer ID state.
> >>
> >> Please take a look and leave any comments you may have!
> >>
> >> KIP:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> >>
> >> Thanks!
> >> Justine
> >>
> >
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Bill Bejeck <bb...@gmail.com>.
Hi Justine,

I just took another look at the KIP, and I realize my question/suggestion
about default values has already been addressed in the `Compatibility`
section.

I'm +1 on the KIP.

-Bill

On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <bb...@gmail.com> wrote:

> Hi Justine,
>
> Thanks for the well written KIP, this looks like it will be a useful
> addition.
>
> Overall the KIP looks good to me, I have one question/comment.
>
> You mentioned that setting the `producer.id.expiration.ms` less than the
> delivery timeout could lead to duplicates, which makes sense.  To help
> avoid this situation, do we want to consider a default value that is the
> same as the delivery timeout?
>
> Thanks again for the KIP.
>
> Bill
>
> On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> <jo...@confluent.io.invalid> wrote:
>
>> Hey all!
>>
>> I'd like to start a discussion on my proposal to separate time-based
>> producer ID expiration from transactional ID expiration by introducing a
>> new configuration.
>>
>> The KIP Is pretty small and simple, but will be helpful in controlling
>> memory usage in brokers -- especially now that by default producers are
>> idempotent and create producer ID state.
>>
>> Please take a look and leave any comments you may have!
>>
>> KIP:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
>> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
>>
>> Thanks!
>> Justine
>>
>

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

Posted by Bill Bejeck <bb...@gmail.com>.
Hi Justine,

Thanks for the well written KIP, this looks like it will be a useful
addition.

Overall the KIP looks good to me, I have one question/comment.

You mentioned that setting the `producer.id.expiration.ms` less than the
delivery timeout could lead to duplicates, which makes sense.  To help
avoid this situation, do we want to consider a default value that is the
same as the delivery timeout?

Thanks again for the KIP.

Bill

On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hey all!
>
> I'd like to start a discussion on my proposal to separate time-based
> producer ID expiration from transactional ID expiration by introducing a
> new configuration.
>
> The KIP Is pretty small and simple, but will be helpful in controlling
> memory usage in brokers -- especially now that by default producers are
> idempotent and create producer ID state.
>
> Please take a look and leave any comments you may have!
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
>
> Thanks!
> Justine
>