You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Matthias J. Sax" <ma...@confluent.io> on 2016/11/18 20:27:02 UTC

[DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Hi all,

I want to start a discussion about KIP-93:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams

Looking forward to your feedback.


-Matthias


Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Yes! Sorry for that typo.

On 12/1/16 5:07 PM, Guozhang Wang wrote:
> You mean "it is a backward incompatible change" right?
> 
> On Wed, Nov 30, 2016 at 4:28 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Thanks for this clarification (and your +1)
>>
>> I completely agree and just want to add my thoughts:
>>
>> 1. Yes, it is a backward compatible change but as I discusses with
>> others, we want accept this for now. All previous releases did contain
>> non-compatible changes for Kafka Streams, too. And as Kafka Streams API
>> is not guaranteed to be stable at this point, we better do breaking
>> changes now than later.
>>
>> 2. At some point, we need to be more conservative with breaking chances
>> and only allow them for major releases.
>>
>> 3. As we expect that most people do use default timestamp extractor,
>> they will not be effected anyway. Only if custom extractors are used,
>> the application needs to be recompiled. Thus, the effort to make the
>> change backward compatible seems not to be worth the effort.
>>
>>
>> -Matthias
>>
>> On 11/29/16 9:57 PM, Ewen Cheslack-Postava wrote:
>>> I think this looks reasonable, but just a more general note on
>>> compatibility -- I think it's worth trying to clarify what types of
>>> compatibility we're trying to achieve. Guozhang's 1 & 2 give an important
>>> breakdown (compile vs runtime compatibility). For the type of change
>>> described here, I think it makes sense to clarify the compatibility
>> goals.
>>> The (pure) compile time compatibility vs (pure) runtime compatibility
>>> aren't the only options -- you have some additional intermediate choices
>> as
>>> well.
>>>
>>> The proposal describes a change which requires recompiling the plugin
>>> (TimestampExtractor) *and* substituting a runtime library (kafka-streams
>>> jar) to get correct updated behavior. This can get interesting if you
>>> already have N streams apps sharing the same TimestampExtractor. You now
>>> *must* update all of them to the new streams jar if any are to be updated
>>> for the new TimestampExtractor API. For folks with a monolithic
>>> repo/versioning setup, this could potentially be painful since they're
>>> forced to update all apps at once. It's at least not too bad since it can
>>> be isolated to a single commit (without deployment needing to be
>>> coordinated, for example), but if the # of apps gets > 4 or 5, these
>> types
>>> of updates start to be a real pain.
>>>
>>> I think this API change is an acceptable (albeit annoying) API
>>> incompatibility right now, but wanted to raise this in the discussion of
>>> this KIP so we consider this moving forward. There definitely are
>>> alternatives that add the new functionality but maintain compatibility
>>> better. In particular, it's possible to define the new interface to
>> require
>>> both APIs:
>>>
>>> // new interface
>>> public interface TimestampExtractor {
>>>     long extract(ConsumerRecord<Object, Object> record);
>>>     long extract(ConsumerRecord<Object, Object> record, long
>>> previousTimestamp);
>>> }
>>>
>>> which requires more effort for the implementor of the new API, but
>>> maintains compatibility if you want to use a new jar including the
>>> TimestampExtractor even with the old version of streams/the
>>> TimestampExtractor interface (since it will correctly dispatch to the old
>>> implementation). It requires more effort on the part of the framework
>> since
>>> it needs to catch runtime exceptions when the second version of extract()
>>> is missing and fall back to the first version. But in some cases that
>> might
>>> be warranted for the sake of compatibility.
>>>
>>> I suspect this update won't cause too much pain right now just because
>> the
>>> number of streams app any user has won't be too large quite yet, but this
>>> seems important to consider moving forward. I think we had some similar
>>> concerns & discussion around the changes to the consumer APIs when trying
>>> to generalize the collection types used in those APIs.
>>>
>>> -Ewen
>>>
>>>
>>> On Mon, Nov 28, 2016 at 10:46 AM, Matthias J. Sax <matthias@confluent.io
>>>
>>> wrote:
>>>
>>>> Done.
>>>>
>>>> If there is no further comments, I would like to start a voting thread
>>>> in a separate email.
>>>>
>>>> -Matthias
>>>>
>>>> On 11/28/16 9:08 AM, Guozhang Wang wrote:
>>>>> Yes it does not include these, again in my previous previous email I
>>>> meant
>>>>> when you say "This is a breaking, incompatible change" people may
>>>> interpret
>>>>> it differently. So better explain it more clearly.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>> On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <
>> matthias@confluent.io
>>>>>
>>>>> wrote:
>>>>>
>>>>>> That does make sense. But KIP-93 does not change anything like this,
>> so
>>>>>> there is nothing to mention, IMHO.
>>>>>>
>>>>>> Or do you mean, the KIP should include that the change is backward
>>>>>> compatible with this regard?
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/24/16 5:31 PM, Guozhang Wang wrote:
>>>>>>> What I meant is that, for some changes (e.g. say we change the
>>>>>>> auto-repartition behavior that caused using different name
>> conventions,
>>>>>> or
>>>>>>> some changes that involve changing the underlying state store names,
>>>> etc)
>>>>>>> the existing internal state including the stores and topics will
>>>> probably
>>>>>>> not valid. Some users consider this also as a "backward incompatible
>>>>>>> change" since they cannot just swipe in the new jar and restart.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <
>>>> matthias@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the feedback. I updated the KIP for (1) and (2).
>>>>>>>>
>>>>>>>> However not for (3): Why should it be required to reset an
>>>> application?
>>>>>>>> If user processed "good" data with valid timestamps, behavior does
>> not
>>>>>>>> change. If user tried to process "bad" data with invalid timestamps,
>>>> the
>>>>>>>> application does fail currently anyway, so there is nothing to
>> reset.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
>>>>>>>>> Regarding the "compatibility" section, I would suggest being a bit
>>>> more
>>>>>>>>> specific about why it is a breaking change. For Streams, it could
>>>> mean
>>>>>>>>> different things:
>>>>>>>>>
>>>>>>>>> 1. User need code change when switching library dependency on the
>> new
>>>>>>>>> version, otherwise it won't compile(I think this is the case for
>> this
>>>>>>>> KIP).
>>>>>>>>> 2. User need code change when switching library dependency on the
>> new
>>>>>>>>> version, otherwise runtime exception will be thrown.
>>>>>>>>> 3. Existing application state as well as internal topics need to be
>>>>>>>> swiped
>>>>>>>>> and the program need to restart from zero.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
>>>>>> matthias@confluent.io
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I want to start a discussion about KIP-93:
>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
>>>>>>>>>>
>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.
You mean "it is a backward incompatible change" right?

On Wed, Nov 30, 2016 at 4:28 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for this clarification (and your +1)
>
> I completely agree and just want to add my thoughts:
>
> 1. Yes, it is a backward compatible change but as I discusses with
> others, we want accept this for now. All previous releases did contain
> non-compatible changes for Kafka Streams, too. And as Kafka Streams API
> is not guaranteed to be stable at this point, we better do breaking
> changes now than later.
>
> 2. At some point, we need to be more conservative with breaking chances
> and only allow them for major releases.
>
> 3. As we expect that most people do use default timestamp extractor,
> they will not be effected anyway. Only if custom extractors are used,
> the application needs to be recompiled. Thus, the effort to make the
> change backward compatible seems not to be worth the effort.
>
>
> -Matthias
>
> On 11/29/16 9:57 PM, Ewen Cheslack-Postava wrote:
> > I think this looks reasonable, but just a more general note on
> > compatibility -- I think it's worth trying to clarify what types of
> > compatibility we're trying to achieve. Guozhang's 1 & 2 give an important
> > breakdown (compile vs runtime compatibility). For the type of change
> > described here, I think it makes sense to clarify the compatibility
> goals.
> > The (pure) compile time compatibility vs (pure) runtime compatibility
> > aren't the only options -- you have some additional intermediate choices
> as
> > well.
> >
> > The proposal describes a change which requires recompiling the plugin
> > (TimestampExtractor) *and* substituting a runtime library (kafka-streams
> > jar) to get correct updated behavior. This can get interesting if you
> > already have N streams apps sharing the same TimestampExtractor. You now
> > *must* update all of them to the new streams jar if any are to be updated
> > for the new TimestampExtractor API. For folks with a monolithic
> > repo/versioning setup, this could potentially be painful since they're
> > forced to update all apps at once. It's at least not too bad since it can
> > be isolated to a single commit (without deployment needing to be
> > coordinated, for example), but if the # of apps gets > 4 or 5, these
> types
> > of updates start to be a real pain.
> >
> > I think this API change is an acceptable (albeit annoying) API
> > incompatibility right now, but wanted to raise this in the discussion of
> > this KIP so we consider this moving forward. There definitely are
> > alternatives that add the new functionality but maintain compatibility
> > better. In particular, it's possible to define the new interface to
> require
> > both APIs:
> >
> > // new interface
> > public interface TimestampExtractor {
> >     long extract(ConsumerRecord<Object, Object> record);
> >     long extract(ConsumerRecord<Object, Object> record, long
> > previousTimestamp);
> > }
> >
> > which requires more effort for the implementor of the new API, but
> > maintains compatibility if you want to use a new jar including the
> > TimestampExtractor even with the old version of streams/the
> > TimestampExtractor interface (since it will correctly dispatch to the old
> > implementation). It requires more effort on the part of the framework
> since
> > it needs to catch runtime exceptions when the second version of extract()
> > is missing and fall back to the first version. But in some cases that
> might
> > be warranted for the sake of compatibility.
> >
> > I suspect this update won't cause too much pain right now just because
> the
> > number of streams app any user has won't be too large quite yet, but this
> > seems important to consider moving forward. I think we had some similar
> > concerns & discussion around the changes to the consumer APIs when trying
> > to generalize the collection types used in those APIs.
> >
> > -Ewen
> >
> >
> > On Mon, Nov 28, 2016 at 10:46 AM, Matthias J. Sax <matthias@confluent.io
> >
> > wrote:
> >
> >> Done.
> >>
> >> If there is no further comments, I would like to start a voting thread
> >> in a separate email.
> >>
> >> -Matthias
> >>
> >> On 11/28/16 9:08 AM, Guozhang Wang wrote:
> >>> Yes it does not include these, again in my previous previous email I
> >> meant
> >>> when you say "This is a breaking, incompatible change" people may
> >> interpret
> >>> it differently. So better explain it more clearly.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <
> matthias@confluent.io
> >>>
> >>> wrote:
> >>>
> >>>> That does make sense. But KIP-93 does not change anything like this,
> so
> >>>> there is nothing to mention, IMHO.
> >>>>
> >>>> Or do you mean, the KIP should include that the change is backward
> >>>> compatible with this regard?
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>> On 11/24/16 5:31 PM, Guozhang Wang wrote:
> >>>>> What I meant is that, for some changes (e.g. say we change the
> >>>>> auto-repartition behavior that caused using different name
> conventions,
> >>>> or
> >>>>> some changes that involve changing the underlying state store names,
> >> etc)
> >>>>> the existing internal state including the stores and topics will
> >> probably
> >>>>> not valid. Some users consider this also as a "backward incompatible
> >>>>> change" since they cannot just swipe in the new jar and restart.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <
> >> matthias@confluent.io>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks for the feedback. I updated the KIP for (1) and (2).
> >>>>>>
> >>>>>> However not for (3): Why should it be required to reset an
> >> application?
> >>>>>> If user processed "good" data with valid timestamps, behavior does
> not
> >>>>>> change. If user tried to process "bad" data with invalid timestamps,
> >> the
> >>>>>> application does fail currently anyway, so there is nothing to
> reset.
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
> >>>>>>> Regarding the "compatibility" section, I would suggest being a bit
> >> more
> >>>>>>> specific about why it is a breaking change. For Streams, it could
> >> mean
> >>>>>>> different things:
> >>>>>>>
> >>>>>>> 1. User need code change when switching library dependency on the
> new
> >>>>>>> version, otherwise it won't compile(I think this is the case for
> this
> >>>>>> KIP).
> >>>>>>> 2. User need code change when switching library dependency on the
> new
> >>>>>>> version, otherwise runtime exception will be thrown.
> >>>>>>> 3. Existing application state as well as internal topics need to be
> >>>>>> swiped
> >>>>>>> and the program need to restart from zero.
> >>>>>>>
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
> >>>> matthias@confluent.io
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I want to start a discussion about KIP-93:
> >>>>>>>>
> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
> >>>>>>>>
> >>>>>>>> Looking forward to your feedback.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for this clarification (and your +1)

I completely agree and just want to add my thoughts:

1. Yes, it is a backward compatible change but as I discusses with
others, we want accept this for now. All previous releases did contain
non-compatible changes for Kafka Streams, too. And as Kafka Streams API
is not guaranteed to be stable at this point, we better do breaking
changes now than later.

2. At some point, we need to be more conservative with breaking chances
and only allow them for major releases.

3. As we expect that most people do use default timestamp extractor,
they will not be effected anyway. Only if custom extractors are used,
the application needs to be recompiled. Thus, the effort to make the
change backward compatible seems not to be worth the effort.


-Matthias

On 11/29/16 9:57 PM, Ewen Cheslack-Postava wrote:
> I think this looks reasonable, but just a more general note on
> compatibility -- I think it's worth trying to clarify what types of
> compatibility we're trying to achieve. Guozhang's 1 & 2 give an important
> breakdown (compile vs runtime compatibility). For the type of change
> described here, I think it makes sense to clarify the compatibility goals.
> The (pure) compile time compatibility vs (pure) runtime compatibility
> aren't the only options -- you have some additional intermediate choices as
> well.
> 
> The proposal describes a change which requires recompiling the plugin
> (TimestampExtractor) *and* substituting a runtime library (kafka-streams
> jar) to get correct updated behavior. This can get interesting if you
> already have N streams apps sharing the same TimestampExtractor. You now
> *must* update all of them to the new streams jar if any are to be updated
> for the new TimestampExtractor API. For folks with a monolithic
> repo/versioning setup, this could potentially be painful since they're
> forced to update all apps at once. It's at least not too bad since it can
> be isolated to a single commit (without deployment needing to be
> coordinated, for example), but if the # of apps gets > 4 or 5, these types
> of updates start to be a real pain.
> 
> I think this API change is an acceptable (albeit annoying) API
> incompatibility right now, but wanted to raise this in the discussion of
> this KIP so we consider this moving forward. There definitely are
> alternatives that add the new functionality but maintain compatibility
> better. In particular, it's possible to define the new interface to require
> both APIs:
> 
> // new interface
> public interface TimestampExtractor {
>     long extract(ConsumerRecord<Object, Object> record);
>     long extract(ConsumerRecord<Object, Object> record, long
> previousTimestamp);
> }
> 
> which requires more effort for the implementor of the new API, but
> maintains compatibility if you want to use a new jar including the
> TimestampExtractor even with the old version of streams/the
> TimestampExtractor interface (since it will correctly dispatch to the old
> implementation). It requires more effort on the part of the framework since
> it needs to catch runtime exceptions when the second version of extract()
> is missing and fall back to the first version. But in some cases that might
> be warranted for the sake of compatibility.
> 
> I suspect this update won't cause too much pain right now just because the
> number of streams app any user has won't be too large quite yet, but this
> seems important to consider moving forward. I think we had some similar
> concerns & discussion around the changes to the consumer APIs when trying
> to generalize the collection types used in those APIs.
> 
> -Ewen
> 
> 
> On Mon, Nov 28, 2016 at 10:46 AM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Done.
>>
>> If there is no further comments, I would like to start a voting thread
>> in a separate email.
>>
>> -Matthias
>>
>> On 11/28/16 9:08 AM, Guozhang Wang wrote:
>>> Yes it does not include these, again in my previous previous email I
>> meant
>>> when you say "This is a breaking, incompatible change" people may
>> interpret
>>> it differently. So better explain it more clearly.
>>>
>>>
>>> Guozhang
>>>
>>> On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <matthias@confluent.io
>>>
>>> wrote:
>>>
>>>> That does make sense. But KIP-93 does not change anything like this, so
>>>> there is nothing to mention, IMHO.
>>>>
>>>> Or do you mean, the KIP should include that the change is backward
>>>> compatible with this regard?
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 11/24/16 5:31 PM, Guozhang Wang wrote:
>>>>> What I meant is that, for some changes (e.g. say we change the
>>>>> auto-repartition behavior that caused using different name conventions,
>>>> or
>>>>> some changes that involve changing the underlying state store names,
>> etc)
>>>>> the existing internal state including the stores and topics will
>> probably
>>>>> not valid. Some users consider this also as a "backward incompatible
>>>>> change" since they cannot just swipe in the new jar and restart.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <
>> matthias@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the feedback. I updated the KIP for (1) and (2).
>>>>>>
>>>>>> However not for (3): Why should it be required to reset an
>> application?
>>>>>> If user processed "good" data with valid timestamps, behavior does not
>>>>>> change. If user tried to process "bad" data with invalid timestamps,
>> the
>>>>>> application does fail currently anyway, so there is nothing to reset.
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
>>>>>>> Regarding the "compatibility" section, I would suggest being a bit
>> more
>>>>>>> specific about why it is a breaking change. For Streams, it could
>> mean
>>>>>>> different things:
>>>>>>>
>>>>>>> 1. User need code change when switching library dependency on the new
>>>>>>> version, otherwise it won't compile(I think this is the case for this
>>>>>> KIP).
>>>>>>> 2. User need code change when switching library dependency on the new
>>>>>>> version, otherwise runtime exception will be thrown.
>>>>>>> 3. Existing application state as well as internal topics need to be
>>>>>> swiped
>>>>>>> and the program need to restart from zero.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
>>>> matthias@confluent.io
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I want to start a discussion about KIP-93:
>>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
>>>>>>>>
>>>>>>>> Looking forward to your feedback.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
I think this looks reasonable, but just a more general note on
compatibility -- I think it's worth trying to clarify what types of
compatibility we're trying to achieve. Guozhang's 1 & 2 give an important
breakdown (compile vs runtime compatibility). For the type of change
described here, I think it makes sense to clarify the compatibility goals.
The (pure) compile time compatibility vs (pure) runtime compatibility
aren't the only options -- you have some additional intermediate choices as
well.

The proposal describes a change which requires recompiling the plugin
(TimestampExtractor) *and* substituting a runtime library (kafka-streams
jar) to get correct updated behavior. This can get interesting if you
already have N streams apps sharing the same TimestampExtractor. You now
*must* update all of them to the new streams jar if any are to be updated
for the new TimestampExtractor API. For folks with a monolithic
repo/versioning setup, this could potentially be painful since they're
forced to update all apps at once. It's at least not too bad since it can
be isolated to a single commit (without deployment needing to be
coordinated, for example), but if the # of apps gets > 4 or 5, these types
of updates start to be a real pain.

I think this API change is an acceptable (albeit annoying) API
incompatibility right now, but wanted to raise this in the discussion of
this KIP so we consider this moving forward. There definitely are
alternatives that add the new functionality but maintain compatibility
better. In particular, it's possible to define the new interface to require
both APIs:

// new interface
public interface TimestampExtractor {
    long extract(ConsumerRecord<Object, Object> record);
    long extract(ConsumerRecord<Object, Object> record, long
previousTimestamp);
}

which requires more effort for the implementor of the new API, but
maintains compatibility if you want to use a new jar including the
TimestampExtractor even with the old version of streams/the
TimestampExtractor interface (since it will correctly dispatch to the old
implementation). It requires more effort on the part of the framework since
it needs to catch runtime exceptions when the second version of extract()
is missing and fall back to the first version. But in some cases that might
be warranted for the sake of compatibility.

I suspect this update won't cause too much pain right now just because the
number of streams app any user has won't be too large quite yet, but this
seems important to consider moving forward. I think we had some similar
concerns & discussion around the changes to the consumer APIs when trying
to generalize the collection types used in those APIs.

-Ewen


On Mon, Nov 28, 2016 at 10:46 AM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Done.
>
> If there is no further comments, I would like to start a voting thread
> in a separate email.
>
> -Matthias
>
> On 11/28/16 9:08 AM, Guozhang Wang wrote:
> > Yes it does not include these, again in my previous previous email I
> meant
> > when you say "This is a breaking, incompatible change" people may
> interpret
> > it differently. So better explain it more clearly.
> >
> >
> > Guozhang
> >
> > On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <matthias@confluent.io
> >
> > wrote:
> >
> >> That does make sense. But KIP-93 does not change anything like this, so
> >> there is nothing to mention, IMHO.
> >>
> >> Or do you mean, the KIP should include that the change is backward
> >> compatible with this regard?
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 11/24/16 5:31 PM, Guozhang Wang wrote:
> >>> What I meant is that, for some changes (e.g. say we change the
> >>> auto-repartition behavior that caused using different name conventions,
> >> or
> >>> some changes that involve changing the underlying state store names,
> etc)
> >>> the existing internal state including the stores and topics will
> probably
> >>> not valid. Some users consider this also as a "backward incompatible
> >>> change" since they cannot just swipe in the new jar and restart.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <
> matthias@confluent.io>
> >>> wrote:
> >>>
> >>>> Thanks for the feedback. I updated the KIP for (1) and (2).
> >>>>
> >>>> However not for (3): Why should it be required to reset an
> application?
> >>>> If user processed "good" data with valid timestamps, behavior does not
> >>>> change. If user tried to process "bad" data with invalid timestamps,
> the
> >>>> application does fail currently anyway, so there is nothing to reset.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
> >>>>> Regarding the "compatibility" section, I would suggest being a bit
> more
> >>>>> specific about why it is a breaking change. For Streams, it could
> mean
> >>>>> different things:
> >>>>>
> >>>>> 1. User need code change when switching library dependency on the new
> >>>>> version, otherwise it won't compile(I think this is the case for this
> >>>> KIP).
> >>>>> 2. User need code change when switching library dependency on the new
> >>>>> version, otherwise runtime exception will be thrown.
> >>>>> 3. Existing application state as well as internal topics need to be
> >>>> swiped
> >>>>> and the program need to restart from zero.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
> >> matthias@confluent.io
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I want to start a discussion about KIP-93:
> >>>>>>
> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
> >>>>>>
> >>>>>> Looking forward to your feedback.
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>


-- 
Thanks,
Ewen

Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Done.

If there is no further comments, I would like to start a voting thread
in a separate email.

-Matthias

On 11/28/16 9:08 AM, Guozhang Wang wrote:
> Yes it does not include these, again in my previous previous email I meant
> when you say "This is a breaking, incompatible change" people may interpret
> it differently. So better explain it more clearly.
> 
> 
> Guozhang
> 
> On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> That does make sense. But KIP-93 does not change anything like this, so
>> there is nothing to mention, IMHO.
>>
>> Or do you mean, the KIP should include that the change is backward
>> compatible with this regard?
>>
>> -Matthias
>>
>>
>>
>> On 11/24/16 5:31 PM, Guozhang Wang wrote:
>>> What I meant is that, for some changes (e.g. say we change the
>>> auto-repartition behavior that caused using different name conventions,
>> or
>>> some changes that involve changing the underlying state store names, etc)
>>> the existing internal state including the stores and topics will probably
>>> not valid. Some users consider this also as a "backward incompatible
>>> change" since they cannot just swipe in the new jar and restart.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>
>>>> Thanks for the feedback. I updated the KIP for (1) and (2).
>>>>
>>>> However not for (3): Why should it be required to reset an application?
>>>> If user processed "good" data with valid timestamps, behavior does not
>>>> change. If user tried to process "bad" data with invalid timestamps, the
>>>> application does fail currently anyway, so there is nothing to reset.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
>>>>> Regarding the "compatibility" section, I would suggest being a bit more
>>>>> specific about why it is a breaking change. For Streams, it could mean
>>>>> different things:
>>>>>
>>>>> 1. User need code change when switching library dependency on the new
>>>>> version, otherwise it won't compile(I think this is the case for this
>>>> KIP).
>>>>> 2. User need code change when switching library dependency on the new
>>>>> version, otherwise runtime exception will be thrown.
>>>>> 3. Existing application state as well as internal topics need to be
>>>> swiped
>>>>> and the program need to restart from zero.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
>> matthias@confluent.io
>>>>>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I want to start a discussion about KIP-93:
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
>>>>>>
>>>>>> Looking forward to your feedback.
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.
Yes it does not include these, again in my previous previous email I meant
when you say "This is a breaking, incompatible change" people may interpret
it differently. So better explain it more clearly.


Guozhang

On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> That does make sense. But KIP-93 does not change anything like this, so
> there is nothing to mention, IMHO.
>
> Or do you mean, the KIP should include that the change is backward
> compatible with this regard?
>
> -Matthias
>
>
>
> On 11/24/16 5:31 PM, Guozhang Wang wrote:
> > What I meant is that, for some changes (e.g. say we change the
> > auto-repartition behavior that caused using different name conventions,
> or
> > some changes that involve changing the underlying state store names, etc)
> > the existing internal state including the stores and topics will probably
> > not valid. Some users consider this also as a "backward incompatible
> > change" since they cannot just swipe in the new jar and restart.
> >
> >
> > Guozhang
> >
> >
> > On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> Thanks for the feedback. I updated the KIP for (1) and (2).
> >>
> >> However not for (3): Why should it be required to reset an application?
> >> If user processed "good" data with valid timestamps, behavior does not
> >> change. If user tried to process "bad" data with invalid timestamps, the
> >> application does fail currently anyway, so there is nothing to reset.
> >>
> >>
> >> -Matthias
> >>
> >> On 11/22/16 9:53 AM, Guozhang Wang wrote:
> >>> Regarding the "compatibility" section, I would suggest being a bit more
> >>> specific about why it is a breaking change. For Streams, it could mean
> >>> different things:
> >>>
> >>> 1. User need code change when switching library dependency on the new
> >>> version, otherwise it won't compile(I think this is the case for this
> >> KIP).
> >>> 2. User need code change when switching library dependency on the new
> >>> version, otherwise runtime exception will be thrown.
> >>> 3. Existing application state as well as internal topics need to be
> >> swiped
> >>> and the program need to restart from zero.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
> matthias@confluent.io
> >>>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I want to start a discussion about KIP-93:
> >>>>
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
> >>>>
> >>>> Looking forward to your feedback.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by "Matthias J. Sax" <ma...@confluent.io>.
That does make sense. But KIP-93 does not change anything like this, so
there is nothing to mention, IMHO.

Or do you mean, the KIP should include that the change is backward
compatible with this regard?

-Matthias



On 11/24/16 5:31 PM, Guozhang Wang wrote:
> What I meant is that, for some changes (e.g. say we change the
> auto-repartition behavior that caused using different name conventions, or
> some changes that involve changing the underlying state store names, etc)
> the existing internal state including the stores and topics will probably
> not valid. Some users consider this also as a "backward incompatible
> change" since they cannot just swipe in the new jar and restart.
> 
> 
> Guozhang
> 
> 
> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Thanks for the feedback. I updated the KIP for (1) and (2).
>>
>> However not for (3): Why should it be required to reset an application?
>> If user processed "good" data with valid timestamps, behavior does not
>> change. If user tried to process "bad" data with invalid timestamps, the
>> application does fail currently anyway, so there is nothing to reset.
>>
>>
>> -Matthias
>>
>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
>>> Regarding the "compatibility" section, I would suggest being a bit more
>>> specific about why it is a breaking change. For Streams, it could mean
>>> different things:
>>>
>>> 1. User need code change when switching library dependency on the new
>>> version, otherwise it won't compile(I think this is the case for this
>> KIP).
>>> 2. User need code change when switching library dependency on the new
>>> version, otherwise runtime exception will be thrown.
>>> 3. Existing application state as well as internal topics need to be
>> swiped
>>> and the program need to restart from zero.
>>>
>>>
>>> Guozhang
>>>
>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <matthias@confluent.io
>>>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I want to start a discussion about KIP-93:
>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
>>>>
>>>> Looking forward to your feedback.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.
What I meant is that, for some changes (e.g. say we change the
auto-repartition behavior that caused using different name conventions, or
some changes that involve changing the underlying state store names, etc)
the existing internal state including the stores and topics will probably
not valid. Some users consider this also as a "backward incompatible
change" since they cannot just swipe in the new jar and restart.


Guozhang


On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for the feedback. I updated the KIP for (1) and (2).
>
> However not for (3): Why should it be required to reset an application?
> If user processed "good" data with valid timestamps, behavior does not
> change. If user tried to process "bad" data with invalid timestamps, the
> application does fail currently anyway, so there is nothing to reset.
>
>
> -Matthias
>
> On 11/22/16 9:53 AM, Guozhang Wang wrote:
> > Regarding the "compatibility" section, I would suggest being a bit more
> > specific about why it is a breaking change. For Streams, it could mean
> > different things:
> >
> > 1. User need code change when switching library dependency on the new
> > version, otherwise it won't compile(I think this is the case for this
> KIP).
> > 2. User need code change when switching library dependency on the new
> > version, otherwise runtime exception will be thrown.
> > 3. Existing application state as well as internal topics need to be
> swiped
> > and the program need to restart from zero.
> >
> >
> > Guozhang
> >
> > On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <matthias@confluent.io
> >
> > wrote:
> >
> >> Hi all,
> >>
> >> I want to start a discussion about KIP-93:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
> >>
> >> Looking forward to your feedback.
> >>
> >>
> >> -Matthias
> >>
> >>
> >
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for the feedback. I updated the KIP for (1) and (2).

However not for (3): Why should it be required to reset an application?
If user processed "good" data with valid timestamps, behavior does not
change. If user tried to process "bad" data with invalid timestamps, the
application does fail currently anyway, so there is nothing to reset.


-Matthias

On 11/22/16 9:53 AM, Guozhang Wang wrote:
> Regarding the "compatibility" section, I would suggest being a bit more
> specific about why it is a breaking change. For Streams, it could mean
> different things:
> 
> 1. User need code change when switching library dependency on the new
> version, otherwise it won't compile(I think this is the case for this KIP).
> 2. User need code change when switching library dependency on the new
> version, otherwise runtime exception will be thrown.
> 3. Existing application state as well as internal topics need to be swiped
> and the program need to restart from zero.
> 
> 
> Guozhang
> 
> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Hi all,
>>
>> I want to start a discussion about KIP-93:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
>>
>> Looking forward to your feedback.
>>
>>
>> -Matthias
>>
>>
> 
> 


Re: [DISCUSS] KIP-93: Improve invalid timestamp handling in Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.
Regarding the "compatibility" section, I would suggest being a bit more
specific about why it is a breaking change. For Streams, it could mean
different things:

1. User need code change when switching library dependency on the new
version, otherwise it won't compile(I think this is the case for this KIP).
2. User need code change when switching library dependency on the new
version, otherwise runtime exception will be thrown.
3. Existing application state as well as internal topics need to be swiped
and the program need to restart from zero.


Guozhang

On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Hi all,
>
> I want to start a discussion about KIP-93:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
>
> Looking forward to your feedback.
>
>
> -Matthias
>
>


-- 
-- Guozhang