You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Maarten Duijn <ma...@msn.com> on 2019/04/02 15:28:31 UTC

[DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Kafka Streams currently does not allow configuring the internal changelog
topic created by KTable.suppress. This KIP introduces a design for adding
topic configurations to the suppress API.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress


Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by John Roesler <jo...@confluent.io>.
Thanks all, for your comments!

I'm on board with the consensus, so it sounds like your KIP is fine as-is,
Maarten. Would you like to start the vote thread?

(just look at the mailing list for other examples)

Thanks,
-John

On Sat, Apr 13, 2019 at 5:26 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> > Are you sure the users are aware that with `withLoggingDisabled()`, they
> > might lose data during failover?
>
> I hope so :D
>
> Of course, we can always improve the JavaDocs.
>
>
> -Matthias
>
>
> On 4/12/19 2:48 PM, Bill Bejeck wrote:
> > Thanks for the KIP Maarten.
> >
> > I also agree that keeping the `withLoggingDisabled()` and
> > `withLoggingEnabled(Map)` methods is the better option.
> >
> > When it comes to educating the users on the downside of disabling
> logging,
> > IMHO I think a comment in the JavaDoc should be sufficient.
> >
> > -Bill
> >
> > On Fri, Apr 12, 2019 at 3:59 PM Bruno Cadonna <br...@confluent.io>
> wrote:
> >
> >> Matthias,
> >>
> >> Are you sure the users are aware that with `withLoggingDisabled()`, they
> >> might lose data during failover?
> >>
> >> OK, we maybe do not necessarily need a WARN log. However, I would at
> least
> >> add a comment like in `StoreBuilder`,ie,
> >>
> >> /**
> >> * Disable the changelog for store built by this {@link StoreBuilder}.
> >> * This will turn off fault-tolerance for your store.
> >> * By default the changelog is enabled.
> >> * @return this
> >> */
> >> StoreBuilder<T> withLoggingDisabled();
> >>
> >> What do you think?
> >>
> >> Best,
> >> Bruno
> >>
> >> On Thu, Apr 11, 2019 at 12:04 AM Matthias J. Sax <matthias@confluent.io
> >
> >> wrote:
> >>
> >>> I think that the current proposal to add `withLoggingDisabled()` and
> >>> `withLoggingEnabled(Map)` should be the best option.
> >>>
> >>> IMHO there is no reason to add a WARN log. We also don't have a WARN
> log
> >>> when people disable logging on regular stores. As Bruno mentioned, this
> >>> might also lead to data loss, so I don't see why we should treat
> >>> suppress() different to other stores.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 4/10/19 3:36 PM, Bruno Cadonna wrote:
> >>>> Hi Marteen and John,
> >>>>
> >>>> I would opt for option 1 with an additional log message on INFO or
> WARN
> >>>> level, since the log file is the place where you would look first to
> >>>> understand what went wrong. I would also not adjust it when
> persistence
> >>>> stores are available for suppress.
> >>>>
> >>>> I would not go for option 2 or 3, because IIUC, with
> >>>> `withLoggingDisabled()` also persistent state stores do not guarantee
> >> not
> >>>> to loose records. Persisting state stores is basically a way to
> >> optimize
> >>>> recovery in certain cases. The changelog topic is the component that
> >>>> guarantees no data loss. So regarding data loss, in my opinion,
> >> disabling
> >>>> logging on the suppression buffer is not different from disabling
> >> logging
> >>>> on other state stores. Please correct me if I am wrong.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On Wed, Apr 10, 2019 at 12:12 PM John Roesler <jo...@confluent.io>
> >> wrote:
> >>>>
> >>>>> Thanks for the update and comments, Maarten. It would be interesting
> >> to
> >>>>> hear what others think as well.
> >>>>> -John
> >>>>>
> >>>>> On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com>
> >>> wrote:
> >>>>>
> >>>>>> Thank you for the explanation regarding the internals, I have edited
> >>> the
> >>>>>> KIP accordingly and updated the Javadoc. About the possible data
> loss
> >>>>> when
> >>>>>> altering changelog config, I think we can improve by doing (one of)
> >> the
> >>>>>> following.
> >>>>>>
> >>>>>> 1) Add a warning in the comments that clearly states what might
> >> happen
> >>>>>> when change logging is disabled and adjust it when persistent stores
> >>> are
> >>>>>> added.
> >>>>>>
> >>>>>> 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of
> >>>>> disabling
> >>>>>> logging, a call to this method minimizes the topic size by
> >> aggressively
> >>>>>> removing the records emitted downstream by the suppress operator. I
> >>>>> believe
> >>>>>> this can be achieved by setting `delete.retention.ms=0` in the
> topic
> >>>>>> config.
> >>>>>>
> >>>>>> 3) Remove `withLoggingDisabled` from the proposal.
> >>>>>>
> >>>>>> 4) Leave both methods as-proposed, as you indicated, this is in line
> >>> with
> >>>>>> the other parts of the Streams API
> >>>>>>
> >>>>>> A user might want to disable logging when downstream is not a Kafka
> >>> topic
> >>>>>> but some other service that does not benefit from
> >> atleast-once-delivery
> >>>>> of
> >>>>>> the suppressed records in case of failover or rebalance.
> >>>>>> Seeing as it might cause data loss, the methods should not be used
> >>>>> lightly
> >>>>>> and I think some comments are warranted. Personally, I rely purely
> on
> >>>>> Kafka
> >>>>>> to prevent data loss even when a store persisted locally, so when
> >>> support
> >>>>>> is added for persistent suppression, I feel the comments may stay.
> >>>>>>
> >>>>>> Maarten
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by "Matthias J. Sax" <ma...@confluent.io>.
> Are you sure the users are aware that with `withLoggingDisabled()`, they
> might lose data during failover?

I hope so :D

Of course, we can always improve the JavaDocs.


-Matthias


On 4/12/19 2:48 PM, Bill Bejeck wrote:
> Thanks for the KIP Maarten.
> 
> I also agree that keeping the `withLoggingDisabled()` and
> `withLoggingEnabled(Map)` methods is the better option.
> 
> When it comes to educating the users on the downside of disabling logging,
> IMHO I think a comment in the JavaDoc should be sufficient.
> 
> -Bill
> 
> On Fri, Apr 12, 2019 at 3:59 PM Bruno Cadonna <br...@confluent.io> wrote:
> 
>> Matthias,
>>
>> Are you sure the users are aware that with `withLoggingDisabled()`, they
>> might lose data during failover?
>>
>> OK, we maybe do not necessarily need a WARN log. However, I would at least
>> add a comment like in `StoreBuilder`,ie,
>>
>> /**
>> * Disable the changelog for store built by this {@link StoreBuilder}.
>> * This will turn off fault-tolerance for your store.
>> * By default the changelog is enabled.
>> * @return this
>> */
>> StoreBuilder<T> withLoggingDisabled();
>>
>> What do you think?
>>
>> Best,
>> Bruno
>>
>> On Thu, Apr 11, 2019 at 12:04 AM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> I think that the current proposal to add `withLoggingDisabled()` and
>>> `withLoggingEnabled(Map)` should be the best option.
>>>
>>> IMHO there is no reason to add a WARN log. We also don't have a WARN log
>>> when people disable logging on regular stores. As Bruno mentioned, this
>>> might also lead to data loss, so I don't see why we should treat
>>> suppress() different to other stores.
>>>
>>>
>>> -Matthias
>>>
>>> On 4/10/19 3:36 PM, Bruno Cadonna wrote:
>>>> Hi Marteen and John,
>>>>
>>>> I would opt for option 1 with an additional log message on INFO or WARN
>>>> level, since the log file is the place where you would look first to
>>>> understand what went wrong. I would also not adjust it when persistence
>>>> stores are available for suppress.
>>>>
>>>> I would not go for option 2 or 3, because IIUC, with
>>>> `withLoggingDisabled()` also persistent state stores do not guarantee
>> not
>>>> to loose records. Persisting state stores is basically a way to
>> optimize
>>>> recovery in certain cases. The changelog topic is the component that
>>>> guarantees no data loss. So regarding data loss, in my opinion,
>> disabling
>>>> logging on the suppression buffer is not different from disabling
>> logging
>>>> on other state stores. Please correct me if I am wrong.
>>>>
>>>> Best,
>>>> Bruno
>>>>
>>>> On Wed, Apr 10, 2019 at 12:12 PM John Roesler <jo...@confluent.io>
>> wrote:
>>>>
>>>>> Thanks for the update and comments, Maarten. It would be interesting
>> to
>>>>> hear what others think as well.
>>>>> -John
>>>>>
>>>>> On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com>
>>> wrote:
>>>>>
>>>>>> Thank you for the explanation regarding the internals, I have edited
>>> the
>>>>>> KIP accordingly and updated the Javadoc. About the possible data loss
>>>>> when
>>>>>> altering changelog config, I think we can improve by doing (one of)
>> the
>>>>>> following.
>>>>>>
>>>>>> 1) Add a warning in the comments that clearly states what might
>> happen
>>>>>> when change logging is disabled and adjust it when persistent stores
>>> are
>>>>>> added.
>>>>>>
>>>>>> 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of
>>>>> disabling
>>>>>> logging, a call to this method minimizes the topic size by
>> aggressively
>>>>>> removing the records emitted downstream by the suppress operator. I
>>>>> believe
>>>>>> this can be achieved by setting `delete.retention.ms=0` in the topic
>>>>>> config.
>>>>>>
>>>>>> 3) Remove `withLoggingDisabled` from the proposal.
>>>>>>
>>>>>> 4) Leave both methods as-proposed, as you indicated, this is in line
>>> with
>>>>>> the other parts of the Streams API
>>>>>>
>>>>>> A user might want to disable logging when downstream is not a Kafka
>>> topic
>>>>>> but some other service that does not benefit from
>> atleast-once-delivery
>>>>> of
>>>>>> the suppressed records in case of failover or rebalance.
>>>>>> Seeing as it might cause data loss, the methods should not be used
>>>>> lightly
>>>>>> and I think some comments are warranted. Personally, I rely purely on
>>>>> Kafka
>>>>>> to prevent data loss even when a store persisted locally, so when
>>> support
>>>>>> is added for persistent suppression, I feel the comments may stay.
>>>>>>
>>>>>> Maarten
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 


Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by Bill Bejeck <bb...@gmail.com>.
Thanks for the KIP Maarten.

I also agree that keeping the `withLoggingDisabled()` and
`withLoggingEnabled(Map)` methods is the better option.

When it comes to educating the users on the downside of disabling logging,
IMHO I think a comment in the JavaDoc should be sufficient.

-Bill

On Fri, Apr 12, 2019 at 3:59 PM Bruno Cadonna <br...@confluent.io> wrote:

> Matthias,
>
> Are you sure the users are aware that with `withLoggingDisabled()`, they
> might lose data during failover?
>
> OK, we maybe do not necessarily need a WARN log. However, I would at least
> add a comment like in `StoreBuilder`,ie,
>
> /**
> * Disable the changelog for store built by this {@link StoreBuilder}.
> * This will turn off fault-tolerance for your store.
> * By default the changelog is enabled.
> * @return this
> */
> StoreBuilder<T> withLoggingDisabled();
>
> What do you think?
>
> Best,
> Bruno
>
> On Thu, Apr 11, 2019 at 12:04 AM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > I think that the current proposal to add `withLoggingDisabled()` and
> > `withLoggingEnabled(Map)` should be the best option.
> >
> > IMHO there is no reason to add a WARN log. We also don't have a WARN log
> > when people disable logging on regular stores. As Bruno mentioned, this
> > might also lead to data loss, so I don't see why we should treat
> > suppress() different to other stores.
> >
> >
> > -Matthias
> >
> > On 4/10/19 3:36 PM, Bruno Cadonna wrote:
> > > Hi Marteen and John,
> > >
> > > I would opt for option 1 with an additional log message on INFO or WARN
> > > level, since the log file is the place where you would look first to
> > > understand what went wrong. I would also not adjust it when persistence
> > > stores are available for suppress.
> > >
> > > I would not go for option 2 or 3, because IIUC, with
> > > `withLoggingDisabled()` also persistent state stores do not guarantee
> not
> > > to loose records. Persisting state stores is basically a way to
> optimize
> > > recovery in certain cases. The changelog topic is the component that
> > > guarantees no data loss. So regarding data loss, in my opinion,
> disabling
> > > logging on the suppression buffer is not different from disabling
> logging
> > > on other state stores. Please correct me if I am wrong.
> > >
> > > Best,
> > > Bruno
> > >
> > > On Wed, Apr 10, 2019 at 12:12 PM John Roesler <jo...@confluent.io>
> wrote:
> > >
> > >> Thanks for the update and comments, Maarten. It would be interesting
> to
> > >> hear what others think as well.
> > >> -John
> > >>
> > >> On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com>
> > wrote:
> > >>
> > >>> Thank you for the explanation regarding the internals, I have edited
> > the
> > >>> KIP accordingly and updated the Javadoc. About the possible data loss
> > >> when
> > >>> altering changelog config, I think we can improve by doing (one of)
> the
> > >>> following.
> > >>>
> > >>> 1) Add a warning in the comments that clearly states what might
> happen
> > >>> when change logging is disabled and adjust it when persistent stores
> > are
> > >>> added.
> > >>>
> > >>> 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of
> > >> disabling
> > >>> logging, a call to this method minimizes the topic size by
> aggressively
> > >>> removing the records emitted downstream by the suppress operator. I
> > >> believe
> > >>> this can be achieved by setting `delete.retention.ms=0` in the topic
> > >>> config.
> > >>>
> > >>> 3) Remove `withLoggingDisabled` from the proposal.
> > >>>
> > >>> 4) Leave both methods as-proposed, as you indicated, this is in line
> > with
> > >>> the other parts of the Streams API
> > >>>
> > >>> A user might want to disable logging when downstream is not a Kafka
> > topic
> > >>> but some other service that does not benefit from
> atleast-once-delivery
> > >> of
> > >>> the suppressed records in case of failover or rebalance.
> > >>> Seeing as it might cause data loss, the methods should not be used
> > >> lightly
> > >>> and I think some comments are warranted. Personally, I rely purely on
> > >> Kafka
> > >>> to prevent data loss even when a store persisted locally, so when
> > support
> > >>> is added for persistent suppression, I feel the comments may stay.
> > >>>
> > >>> Maarten
> > >>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by Bruno Cadonna <br...@confluent.io>.
Matthias,

Are you sure the users are aware that with `withLoggingDisabled()`, they
might lose data during failover?

OK, we maybe do not necessarily need a WARN log. However, I would at least
add a comment like in `StoreBuilder`,ie,

/**
* Disable the changelog for store built by this {@link StoreBuilder}.
* This will turn off fault-tolerance for your store.
* By default the changelog is enabled.
* @return this
*/
StoreBuilder<T> withLoggingDisabled();

What do you think?

Best,
Bruno

On Thu, Apr 11, 2019 at 12:04 AM Matthias J. Sax <ma...@confluent.io>
wrote:

> I think that the current proposal to add `withLoggingDisabled()` and
> `withLoggingEnabled(Map)` should be the best option.
>
> IMHO there is no reason to add a WARN log. We also don't have a WARN log
> when people disable logging on regular stores. As Bruno mentioned, this
> might also lead to data loss, so I don't see why we should treat
> suppress() different to other stores.
>
>
> -Matthias
>
> On 4/10/19 3:36 PM, Bruno Cadonna wrote:
> > Hi Marteen and John,
> >
> > I would opt for option 1 with an additional log message on INFO or WARN
> > level, since the log file is the place where you would look first to
> > understand what went wrong. I would also not adjust it when persistence
> > stores are available for suppress.
> >
> > I would not go for option 2 or 3, because IIUC, with
> > `withLoggingDisabled()` also persistent state stores do not guarantee not
> > to loose records. Persisting state stores is basically a way to optimize
> > recovery in certain cases. The changelog topic is the component that
> > guarantees no data loss. So regarding data loss, in my opinion, disabling
> > logging on the suppression buffer is not different from disabling logging
> > on other state stores. Please correct me if I am wrong.
> >
> > Best,
> > Bruno
> >
> > On Wed, Apr 10, 2019 at 12:12 PM John Roesler <jo...@confluent.io> wrote:
> >
> >> Thanks for the update and comments, Maarten. It would be interesting to
> >> hear what others think as well.
> >> -John
> >>
> >> On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com>
> wrote:
> >>
> >>> Thank you for the explanation regarding the internals, I have edited
> the
> >>> KIP accordingly and updated the Javadoc. About the possible data loss
> >> when
> >>> altering changelog config, I think we can improve by doing (one of) the
> >>> following.
> >>>
> >>> 1) Add a warning in the comments that clearly states what might happen
> >>> when change logging is disabled and adjust it when persistent stores
> are
> >>> added.
> >>>
> >>> 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of
> >> disabling
> >>> logging, a call to this method minimizes the topic size by aggressively
> >>> removing the records emitted downstream by the suppress operator. I
> >> believe
> >>> this can be achieved by setting `delete.retention.ms=0` in the topic
> >>> config.
> >>>
> >>> 3) Remove `withLoggingDisabled` from the proposal.
> >>>
> >>> 4) Leave both methods as-proposed, as you indicated, this is in line
> with
> >>> the other parts of the Streams API
> >>>
> >>> A user might want to disable logging when downstream is not a Kafka
> topic
> >>> but some other service that does not benefit from atleast-once-delivery
> >> of
> >>> the suppressed records in case of failover or rebalance.
> >>> Seeing as it might cause data loss, the methods should not be used
> >> lightly
> >>> and I think some comments are warranted. Personally, I rely purely on
> >> Kafka
> >>> to prevent data loss even when a store persisted locally, so when
> support
> >>> is added for persistent suppression, I feel the comments may stay.
> >>>
> >>> Maarten
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by "Matthias J. Sax" <ma...@confluent.io>.
I think that the current proposal to add `withLoggingDisabled()` and
`withLoggingEnabled(Map)` should be the best option.

IMHO there is no reason to add a WARN log. We also don't have a WARN log
when people disable logging on regular stores. As Bruno mentioned, this
might also lead to data loss, so I don't see why we should treat
suppress() different to other stores.


-Matthias

On 4/10/19 3:36 PM, Bruno Cadonna wrote:
> Hi Marteen and John,
> 
> I would opt for option 1 with an additional log message on INFO or WARN
> level, since the log file is the place where you would look first to
> understand what went wrong. I would also not adjust it when persistence
> stores are available for suppress.
> 
> I would not go for option 2 or 3, because IIUC, with
> `withLoggingDisabled()` also persistent state stores do not guarantee not
> to loose records. Persisting state stores is basically a way to optimize
> recovery in certain cases. The changelog topic is the component that
> guarantees no data loss. So regarding data loss, in my opinion, disabling
> logging on the suppression buffer is not different from disabling logging
> on other state stores. Please correct me if I am wrong.
> 
> Best,
> Bruno
> 
> On Wed, Apr 10, 2019 at 12:12 PM John Roesler <jo...@confluent.io> wrote:
> 
>> Thanks for the update and comments, Maarten. It would be interesting to
>> hear what others think as well.
>> -John
>>
>> On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com> wrote:
>>
>>> Thank you for the explanation regarding the internals, I have edited the
>>> KIP accordingly and updated the Javadoc. About the possible data loss
>> when
>>> altering changelog config, I think we can improve by doing (one of) the
>>> following.
>>>
>>> 1) Add a warning in the comments that clearly states what might happen
>>> when change logging is disabled and adjust it when persistent stores are
>>> added.
>>>
>>> 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of
>> disabling
>>> logging, a call to this method minimizes the topic size by aggressively
>>> removing the records emitted downstream by the suppress operator. I
>> believe
>>> this can be achieved by setting `delete.retention.ms=0` in the topic
>>> config.
>>>
>>> 3) Remove `withLoggingDisabled` from the proposal.
>>>
>>> 4) Leave both methods as-proposed, as you indicated, this is in line with
>>> the other parts of the Streams API
>>>
>>> A user might want to disable logging when downstream is not a Kafka topic
>>> but some other service that does not benefit from atleast-once-delivery
>> of
>>> the suppressed records in case of failover or rebalance.
>>> Seeing as it might cause data loss, the methods should not be used
>> lightly
>>> and I think some comments are warranted. Personally, I rely purely on
>> Kafka
>>> to prevent data loss even when a store persisted locally, so when support
>>> is added for persistent suppression, I feel the comments may stay.
>>>
>>> Maarten
>>>
>>
> 


Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by Bruno Cadonna <br...@confluent.io>.
Hi Marteen and John,

I would opt for option 1 with an additional log message on INFO or WARN
level, since the log file is the place where you would look first to
understand what went wrong. I would also not adjust it when persistence
stores are available for suppress.

I would not go for option 2 or 3, because IIUC, with
`withLoggingDisabled()` also persistent state stores do not guarantee not
to loose records. Persisting state stores is basically a way to optimize
recovery in certain cases. The changelog topic is the component that
guarantees no data loss. So regarding data loss, in my opinion, disabling
logging on the suppression buffer is not different from disabling logging
on other state stores. Please correct me if I am wrong.

Best,
Bruno

On Wed, Apr 10, 2019 at 12:12 PM John Roesler <jo...@confluent.io> wrote:

> Thanks for the update and comments, Maarten. It would be interesting to
> hear what others think as well.
> -John
>
> On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com> wrote:
>
> > Thank you for the explanation regarding the internals, I have edited the
> > KIP accordingly and updated the Javadoc. About the possible data loss
> when
> > altering changelog config, I think we can improve by doing (one of) the
> > following.
> >
> > 1) Add a warning in the comments that clearly states what might happen
> > when change logging is disabled and adjust it when persistent stores are
> > added.
> >
> > 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of
> disabling
> > logging, a call to this method minimizes the topic size by aggressively
> > removing the records emitted downstream by the suppress operator. I
> believe
> > this can be achieved by setting `delete.retention.ms=0` in the topic
> > config.
> >
> > 3) Remove `withLoggingDisabled` from the proposal.
> >
> > 4) Leave both methods as-proposed, as you indicated, this is in line with
> > the other parts of the Streams API
> >
> > A user might want to disable logging when downstream is not a Kafka topic
> > but some other service that does not benefit from atleast-once-delivery
> of
> > the suppressed records in case of failover or rebalance.
> > Seeing as it might cause data loss, the methods should not be used
> lightly
> > and I think some comments are warranted. Personally, I rely purely on
> Kafka
> > to prevent data loss even when a store persisted locally, so when support
> > is added for persistent suppression, I feel the comments may stay.
> >
> > Maarten
> >
>

Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by John Roesler <jo...@confluent.io>.
Thanks for the update and comments, Maarten. It would be interesting to
hear what others think as well.
-John

On Thu, Apr 4, 2019 at 2:43 PM Maarten Duijn <ma...@msn.com> wrote:

> Thank you for the explanation regarding the internals, I have edited the
> KIP accordingly and updated the Javadoc. About the possible data loss when
> altering changelog config, I think we can improve by doing (one of) the
> following.
>
> 1) Add a warning in the comments that clearly states what might happen
> when change logging is disabled and adjust it when persistent stores are
> added.
>
> 2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of disabling
> logging, a call to this method minimizes the topic size by aggressively
> removing the records emitted downstream by the suppress operator. I believe
> this can be achieved by setting `delete.retention.ms=0` in the topic
> config.
>
> 3) Remove `withLoggingDisabled` from the proposal.
>
> 4) Leave both methods as-proposed, as you indicated, this is in line with
> the other parts of the Streams API
>
> A user might want to disable logging when downstream is not a Kafka topic
> but some other service that does not benefit from atleast-once-delivery of
> the suppressed records in case of failover or rebalance.
> Seeing as it might cause data loss, the methods should not be used lightly
> and I think some comments are warranted. Personally, I rely purely on Kafka
> to prevent data loss even when a store persisted locally, so when support
> is added for persistent suppression, I feel the comments may stay.
>
> Maarten
>

Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by Maarten Duijn <ma...@msn.com>.
Thank you for the explanation regarding the internals, I have edited the KIP accordingly and updated the Javadoc. About the possible data loss when altering changelog config, I think we can improve by doing (one of) the following.

1) Add a warning in the comments that clearly states what might happen when change logging is disabled and adjust it when persistent stores are added.

2) Change `withLoggingDisabled` to `minimizeLogging`. Instead of disabling logging, a call to this method minimizes the topic size by aggressively removing the records emitted downstream by the suppress operator. I believe this can be achieved by setting `delete.retention.ms=0` in the topic config.

3) Remove `withLoggingDisabled` from the proposal.

4) Leave both methods as-proposed, as you indicated, this is in line with the other parts of the Streams API

A user might want to disable logging when downstream is not a Kafka topic but some other service that does not benefit from atleast-once-delivery of the suppressed records in case of failover or rebalance.
Seeing as it might cause data loss, the methods should not be used lightly and I think some comments are warranted. Personally, I rely purely on Kafka to prevent data loss even when a store persisted locally, so when support is added for persistent suppression, I feel the comments may stay.

Maarten

Re: [DISCUSS] KIP-446: Add changelog topic configuration to KTable suppress

Posted by John Roesler <jo...@confluent.io>.
Hi Maarten,

Thanks for the KIP!

It looks good to me. It seems appropriate to stick close to the same API
presented by Materialized.

I did notice one mis-statement in the proposed Javadoc:
> * Indicates that a changelog should be created for the suppressed KTable.
Actually, the changelog is for the buffer only, not the KTable. This may be
an important distinction for some configurations that folks might set. When
the suppress operator gets data from upstream, it adds it to the buffer
(and also sends that data to the changelog). When it emits records from the
buffer, it also sends a tombstone to the changelog topic. Thus, data in the
buffer is *much* shorter lived than data in other KTables, and the buffer
changelog also may be much more compactable than KTable changelogs. I only
bring this up because it might lead people to configure the changelog topic
differently.

One question that I have is whether we should place any restriction on the
use of these methods. Since there is currently only an in-memory
implementation of the suppression buffer, disabling the changelog is almost
guaranteed to lead to data loss during rebalances or restarts. Then again,
the same thing is true if you use an in-memory state store with changelog
disabled. Thoughts?

Thanks again for the KIP. It'll be great to get this in.

-John

On Tue, Apr 2, 2019 at 4:39 PM Maarten Duijn <ma...@msn.com> wrote:

> Kafka Streams currently does not allow configuring the internal changelog
> topic created by KTable.suppress. This KIP introduces a design for adding
> topic configurations to the suppress API.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-446%3A+Add+changelog+topic+configuration+to+KTable+suppress
>
>