You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jeyhun Karimov <je...@gmail.com> on 2017/02/15 00:31:37 UTC

[DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Dear community,

I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2]. You
can check the PR in [3].

I would like to get your comments.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
[2] https://issues.apache.org/jira/browse/KAFKA-4144
[3] https://github.com/apache/kafka/pull/2466


Cheers,
Jeyhun
-- 
-Cheers

Jeyhun

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Jeyhun Karimov <je...@gmail.com>.
Hi Matthias,

Done.

On Thu, Feb 16, 2017 at 7:24 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> Jeyhun,
>
> can you please add the KIP to this table:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
>
> and to this list:
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams
>
> Thanks!
>
>
> -Matthias
>
> On 2/14/17 5:36 PM, Matthias J. Sax wrote:
> > Mathieu,
> >
> > I personally agree with your observation, and we have plans to submit a
> > KIP like this. If you want to drive this discussion feel free to start
> > the KIP by yourself!
> >
> > Having said that, for this KIP we might want to focus the discussion the
> > the actual feature that gets added: allowing to specify different
> > TS-Extractor for different inputs.
> >
> >
> >
> > -Matthias
> >
> > On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> >> Hi Jeyhun,
> >>
> >> This KIP might not be the appropriate time, but my first thought
> reading it
> >> is that it might make sense to introduce a builder-style API rather than
> >> adding a mix of new method overloads with independent optional
> parameters.
> >> :-)
> >>
> >> eg. stream(), table(), globalTable(), addSource(), could all accept a
> >> "TopicReference" parameter that can be built like:
> >>
> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
> >>
> >> Mathieu
> >>
> >>
> >> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
> >> wrote:
> >>
> >>> Dear community,
> >>>
> >>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
> [2]. You
> >>> can check the PR in [3].
> >>>
> >>> I would like to get your comments.
> >>>
> >>> [1]
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
> >>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> >>> [3] https://github.com/apache/kafka/pull/2466
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>> --
> >>> -Cheers
> >>>
> >>> Jeyhun
> >>>
> >>
> >
>
> --
-Cheers

Jeyhun

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Jeyhun,

can you please add the KIP to this table:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion

and to this list:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams

Thanks!


-Matthias

On 2/14/17 5:36 PM, Matthias J. Sax wrote:
> Mathieu,
> 
> I personally agree with your observation, and we have plans to submit a
> KIP like this. If you want to drive this discussion feel free to start
> the KIP by yourself!
> 
> Having said that, for this KIP we might want to focus the discussion the
> the actual feature that gets added: allowing to specify different
> TS-Extractor for different inputs.
> 
> 
> 
> -Matthias
> 
> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
>> Hi Jeyhun,
>>
>> This KIP might not be the appropriate time, but my first thought reading it
>> is that it might make sense to introduce a builder-style API rather than
>> adding a mix of new method overloads with independent optional parameters.
>> :-)
>>
>> eg. stream(), table(), globalTable(), addSource(), could all accept a
>> "TopicReference" parameter that can be built like:
>> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
>>
>> Mathieu
>>
>>
>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
>> wrote:
>>
>>> Dear community,
>>>
>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2]. You
>>> can check the PR in [3].
>>>
>>> I would like to get your comments.
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
>>> [3] https://github.com/apache/kafka/pull/2466
>>>
>>>
>>> Cheers,
>>> Jeyhun
>>> --
>>> -Cheers
>>>
>>> Jeyhun
>>>
>>
> 


Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Ewen,

Personally I'm thinking to remove deprecated APIs after one minor release,
so yes, 0.12.0.0 would be the preferred timeline.


Guozhang

On Thu, Apr 6, 2017 at 8:40 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> I agree there's a potential issue that Eno raises, but probably it's not
> worth worrying about trying to enforce it. It seems like we have the right
> default for long term and for most cases a custom timestamp extractor
> should only be needed for legacy data that isn't putting a timestamp in the
> record. I think that if the user chooses to customize the timestamp
> extractor, it's reasonable to expect them to read enough docs to understand
> the implications. (By the way, this is also, I think a case where fewer
> overloads are fine because it should be unusual and a relatively advanced
> use case, so expecting the user to fill in some extra parameters even if
> they are using the default values is ok in order to keep the API tighter.
> However, if we're heading towards a builder API that won't matter anyway.)
>
> One other question I have is what the planned removal timeline is for the
> deprecated configs -- the KIP just says they will be deprecated but not
> when they will be removed. Specifying this is important for users so they
> can plan how aggressively they need to switch. Generally removal is tied to
> major releases, so I assume the thought is to remove in 0.12.0.0?
>
> -Ewen
>
> On Tue, Feb 28, 2017 at 9:21 AM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Eno,
> >
> > I think this problem is out-of-scope and also present in the current
> > setting. We cannot avoid that a custom timestamp extractor uses and
> > if-else branch and returns different timestamps for different topic.
> > That is possible even right now.
> >
> > Furthermore, the TimestampExtractor interface states:
> >
> > > The extracted timestamp MUST represent the milliseconds since midnight,
> > January 1, 1970 UTC.
> >
> > If uses don't follow this, there is nothing we can do about it.
> >
> >
> > -Matthias
> >
> >
> > On 2/28/17 7:47 AM, Jeyhun Karimov wrote:
> > > Hi Eno,
> > >
> > > Thanks for clarification. I think it is by definition allowed.  So if
> we
> > > want to join a stream that uses wallclock time with a stream that uses
> an
> > > event time, then we can assign the first one a timestamp extractor that
> > > returns system clock, and for the second stream we can assign timestamp
> > > extractor that extracts/computes the event time from record.
> > >
> > > Cheers,
> > > Jeyhun
> > >
> > > On Tue, Feb 28, 2017 at 11:40 AM Eno Thereska <en...@gmail.com>
> > > wrote:
> > >
> > >> Hi Jeyhun,
> > >>
> > >> I mean something slightly different. In your motivation you say
> "joining
> > >> multiple streams/tables that require different timestamp extraction
> > >> methods". I wan to understand the scope of this. Is it allowed to
> have a
> > >> stream that uses wallclock time join a stream that uses event time?
> (It
> > >> would be good to give some examples in the motivation about scenarios
> > you
> > >> envision). If the join is not allowed, how do you prevent that join
> from
> > >> happening? Do you throw an exception?
> > >>
> > >> Thanks
> > >> Eno
> > >>
> > >>
> > >>> On 28 Feb 2017, at 10:04, Jeyhun Karimov <je...@gmail.com>
> wrote:
> > >>>
> > >>> Hi Eno,
> > >>>
> > >>> Thanks for feedback. I think you mean [1]. In this KIP we do not
> > consider
> > >>> the situations you mentioned. So, either we can extend the KIP and
> > solve
> > >>> mentioned issues  or submit 2 PRs incrementally.
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-4785
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Jeyhun
> > >>>
> > >>> On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <
> eno.thereska@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi Jeyhun,
> > >>>>
> > >>>> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
> > >>>>
> > >>>> One thing I'd like to understand is whether we can avoid situations
> > >> where
> > >>>> the user is mixing different times (event time vs. wallclock time)
> in
> > >> their
> > >>>> processing inadvertently. Before this KIP, all the relevant topics
> > have
> > >> one
> > >>>> time stamp extractor so that issue does not come up.
> > >>>>
> > >>>> What will be the behavior if times mismatch, e.g., for joins?
> > >>>>
> > >>>> Thanks
> > >>>> Eno
> > >>>>
> > >>>>> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com>
> > wrote:
> > >>>>>
> > >>>>> Dear community,
> > >>>>>
> > >>>>> I would like to get further feedbacks on this KIP (if any).
> > >>>>>
> > >>>>> Cheers
> > >>>>> Jeyhun
> > >>>>>
> > >>>>> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <
> > matthias@confluent.io
> > >>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Mathieu,
> > >>>>>>
> > >>>>>> I personally agree with your observation, and we have plans to
> > submit
> > >> a
> > >>>>>> KIP like this. If you want to drive this discussion feel free to
> > start
> > >>>>>> the KIP by yourself!
> > >>>>>>
> > >>>>>> Having said that, for this KIP we might want to focus the
> discussion
> > >> the
> > >>>>>> the actual feature that gets added: allowing to specify different
> > >>>>>> TS-Extractor for different inputs.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> > >>>>>>> Hi Jeyhun,
> > >>>>>>>
> > >>>>>>> This KIP might not be the appropriate time, but my first thought
> > >>>> reading
> > >>>>>> it
> > >>>>>>> is that it might make sense to introduce a builder-style API
> rather
> > >>>> than
> > >>>>>>> adding a mix of new method overloads with independent optional
> > >>>>>> parameters.
> > >>>>>>> :-)
> > >>>>>>>
> > >>>>>>> eg. stream(), table(), globalTable(), addSource(), could all
> > accept a
> > >>>>>>> "TopicReference" parameter that can be built like:
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >> TopicReference("my-topic").keySerde(...).valueSerde(...).
> > autoOffsetReset(...).timestampExtractor(...).build().
> > >>>>>>>
> > >>>>>>> Mathieu
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <
> > >> je.karimov@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Dear community,
> > >>>>>>>>
> > >>>>>>>> I want to share the KIP-123 [1] which is based on issue
> KAFKA-4144
> > >>>> [2].
> > >>>>>> You
> > >>>>>>>> can check the PR in [3].
> > >>>>>>>>
> > >>>>>>>> I would like to get your comments.
> > >>>>>>>>
> > >>>>>>>> [1]
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >> https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=68714788
> > >>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> > >>>>>>>> [3] https://github.com/apache/kafka/pull/2466
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Cheers,
> > >>>>>>>> Jeyhun
> > >>>>>>>> --
> > >>>>>>>> -Cheers
> > >>>>>>>>
> > >>>>>>>> Jeyhun
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>> -Cheers
> > >>>>>
> > >>>>> Jeyhun
> > >>>>
> > >>>> --
> > >>> -Cheers
> > >>>
> > >>> Jeyhun
> > >>
> > >> --
> > > -Cheers
> > >
> > > Jeyhun
> > >
> >
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
I agree there's a potential issue that Eno raises, but probably it's not
worth worrying about trying to enforce it. It seems like we have the right
default for long term and for most cases a custom timestamp extractor
should only be needed for legacy data that isn't putting a timestamp in the
record. I think that if the user chooses to customize the timestamp
extractor, it's reasonable to expect them to read enough docs to understand
the implications. (By the way, this is also, I think a case where fewer
overloads are fine because it should be unusual and a relatively advanced
use case, so expecting the user to fill in some extra parameters even if
they are using the default values is ok in order to keep the API tighter.
However, if we're heading towards a builder API that won't matter anyway.)

One other question I have is what the planned removal timeline is for the
deprecated configs -- the KIP just says they will be deprecated but not
when they will be removed. Specifying this is important for users so they
can plan how aggressively they need to switch. Generally removal is tied to
major releases, so I assume the thought is to remove in 0.12.0.0?

-Ewen

On Tue, Feb 28, 2017 at 9:21 AM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Eno,
>
> I think this problem is out-of-scope and also present in the current
> setting. We cannot avoid that a custom timestamp extractor uses and
> if-else branch and returns different timestamps for different topic.
> That is possible even right now.
>
> Furthermore, the TimestampExtractor interface states:
>
> > The extracted timestamp MUST represent the milliseconds since midnight,
> January 1, 1970 UTC.
>
> If uses don't follow this, there is nothing we can do about it.
>
>
> -Matthias
>
>
> On 2/28/17 7:47 AM, Jeyhun Karimov wrote:
> > Hi Eno,
> >
> > Thanks for clarification. I think it is by definition allowed.  So if we
> > want to join a stream that uses wallclock time with a stream that uses an
> > event time, then we can assign the first one a timestamp extractor that
> > returns system clock, and for the second stream we can assign timestamp
> > extractor that extracts/computes the event time from record.
> >
> > Cheers,
> > Jeyhun
> >
> > On Tue, Feb 28, 2017 at 11:40 AM Eno Thereska <en...@gmail.com>
> > wrote:
> >
> >> Hi Jeyhun,
> >>
> >> I mean something slightly different. In your motivation you say "joining
> >> multiple streams/tables that require different timestamp extraction
> >> methods". I wan to understand the scope of this. Is it allowed to have a
> >> stream that uses wallclock time join a stream that uses event time? (It
> >> would be good to give some examples in the motivation about scenarios
> you
> >> envision). If the join is not allowed, how do you prevent that join from
> >> happening? Do you throw an exception?
> >>
> >> Thanks
> >> Eno
> >>
> >>
> >>> On 28 Feb 2017, at 10:04, Jeyhun Karimov <je...@gmail.com> wrote:
> >>>
> >>> Hi Eno,
> >>>
> >>> Thanks for feedback. I think you mean [1]. In this KIP we do not
> consider
> >>> the situations you mentioned. So, either we can extend the KIP and
> solve
> >>> mentioned issues  or submit 2 PRs incrementally.
> >>>
> >>> [1] https://issues.apache.org/jira/browse/KAFKA-4785
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>>
> >>> On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <en...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Jeyhun,
> >>>>
> >>>> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
> >>>>
> >>>> One thing I'd like to understand is whether we can avoid situations
> >> where
> >>>> the user is mixing different times (event time vs. wallclock time) in
> >> their
> >>>> processing inadvertently. Before this KIP, all the relevant topics
> have
> >> one
> >>>> time stamp extractor so that issue does not come up.
> >>>>
> >>>> What will be the behavior if times mismatch, e.g., for joins?
> >>>>
> >>>> Thanks
> >>>> Eno
> >>>>
> >>>>> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com>
> wrote:
> >>>>>
> >>>>> Dear community,
> >>>>>
> >>>>> I would like to get further feedbacks on this KIP (if any).
> >>>>>
> >>>>> Cheers
> >>>>> Jeyhun
> >>>>>
> >>>>> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <
> matthias@confluent.io
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Mathieu,
> >>>>>>
> >>>>>> I personally agree with your observation, and we have plans to
> submit
> >> a
> >>>>>> KIP like this. If you want to drive this discussion feel free to
> start
> >>>>>> the KIP by yourself!
> >>>>>>
> >>>>>> Having said that, for this KIP we might want to focus the discussion
> >> the
> >>>>>> the actual feature that gets added: allowing to specify different
> >>>>>> TS-Extractor for different inputs.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> >>>>>>> Hi Jeyhun,
> >>>>>>>
> >>>>>>> This KIP might not be the appropriate time, but my first thought
> >>>> reading
> >>>>>> it
> >>>>>>> is that it might make sense to introduce a builder-style API rather
> >>>> than
> >>>>>>> adding a mix of new method overloads with independent optional
> >>>>>> parameters.
> >>>>>>> :-)
> >>>>>>>
> >>>>>>> eg. stream(), table(), globalTable(), addSource(), could all
> accept a
> >>>>>>> "TopicReference" parameter that can be built like:
> >>>>>>>
> >>>>>>
> >>>>
> >> TopicReference("my-topic").keySerde(...).valueSerde(...).
> autoOffsetReset(...).timestampExtractor(...).build().
> >>>>>>>
> >>>>>>> Mathieu
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <
> >> je.karimov@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Dear community,
> >>>>>>>>
> >>>>>>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
> >>>> [2].
> >>>>>> You
> >>>>>>>> can check the PR in [3].
> >>>>>>>>
> >>>>>>>> I would like to get your comments.
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>
> >>>>
> >> https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68714788
> >>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> >>>>>>>> [3] https://github.com/apache/kafka/pull/2466
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Jeyhun
> >>>>>>>> --
> >>>>>>>> -Cheers
> >>>>>>>>
> >>>>>>>> Jeyhun
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>> -Cheers
> >>>>>
> >>>>> Jeyhun
> >>>>
> >>>> --
> >>> -Cheers
> >>>
> >>> Jeyhun
> >>
> >> --
> > -Cheers
> >
> > Jeyhun
> >
>
>

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Eno,

I think this problem is out-of-scope and also present in the current
setting. We cannot avoid that a custom timestamp extractor uses and
if-else branch and returns different timestamps for different topic.
That is possible even right now.

Furthermore, the TimestampExtractor interface states:

> The extracted timestamp MUST represent the milliseconds since midnight, January 1, 1970 UTC.

If uses don't follow this, there is nothing we can do about it.


-Matthias


On 2/28/17 7:47 AM, Jeyhun Karimov wrote:
> Hi Eno,
> 
> Thanks for clarification. I think it is by definition allowed.  So if we
> want to join a stream that uses wallclock time with a stream that uses an
> event time, then we can assign the first one a timestamp extractor that
> returns system clock, and for the second stream we can assign timestamp
> extractor that extracts/computes the event time from record.
> 
> Cheers,
> Jeyhun
> 
> On Tue, Feb 28, 2017 at 11:40 AM Eno Thereska <en...@gmail.com>
> wrote:
> 
>> Hi Jeyhun,
>>
>> I mean something slightly different. In your motivation you say "joining
>> multiple streams/tables that require different timestamp extraction
>> methods". I wan to understand the scope of this. Is it allowed to have a
>> stream that uses wallclock time join a stream that uses event time? (It
>> would be good to give some examples in the motivation about scenarios you
>> envision). If the join is not allowed, how do you prevent that join from
>> happening? Do you throw an exception?
>>
>> Thanks
>> Eno
>>
>>
>>> On 28 Feb 2017, at 10:04, Jeyhun Karimov <je...@gmail.com> wrote:
>>>
>>> Hi Eno,
>>>
>>> Thanks for feedback. I think you mean [1]. In this KIP we do not consider
>>> the situations you mentioned. So, either we can extend the KIP and solve
>>> mentioned issues  or submit 2 PRs incrementally.
>>>
>>> [1] https://issues.apache.org/jira/browse/KAFKA-4785
>>>
>>>
>>> Cheers,
>>> Jeyhun
>>>
>>> On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <en...@gmail.com>
>>> wrote:
>>>
>>>> Hi Jeyhun,
>>>>
>>>> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
>>>>
>>>> One thing I'd like to understand is whether we can avoid situations
>> where
>>>> the user is mixing different times (event time vs. wallclock time) in
>> their
>>>> processing inadvertently. Before this KIP, all the relevant topics have
>> one
>>>> time stamp extractor so that issue does not come up.
>>>>
>>>> What will be the behavior if times mismatch, e.g., for joins?
>>>>
>>>> Thanks
>>>> Eno
>>>>
>>>>> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com> wrote:
>>>>>
>>>>> Dear community,
>>>>>
>>>>> I would like to get further feedbacks on this KIP (if any).
>>>>>
>>>>> Cheers
>>>>> Jeyhun
>>>>>
>>>>> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <matthias@confluent.io
>>>
>>>>> wrote:
>>>>>
>>>>>> Mathieu,
>>>>>>
>>>>>> I personally agree with your observation, and we have plans to submit
>> a
>>>>>> KIP like this. If you want to drive this discussion feel free to start
>>>>>> the KIP by yourself!
>>>>>>
>>>>>> Having said that, for this KIP we might want to focus the discussion
>> the
>>>>>> the actual feature that gets added: allowing to specify different
>>>>>> TS-Extractor for different inputs.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
>>>>>>> Hi Jeyhun,
>>>>>>>
>>>>>>> This KIP might not be the appropriate time, but my first thought
>>>> reading
>>>>>> it
>>>>>>> is that it might make sense to introduce a builder-style API rather
>>>> than
>>>>>>> adding a mix of new method overloads with independent optional
>>>>>> parameters.
>>>>>>> :-)
>>>>>>>
>>>>>>> eg. stream(), table(), globalTable(), addSource(), could all accept a
>>>>>>> "TopicReference" parameter that can be built like:
>>>>>>>
>>>>>>
>>>>
>> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
>>>>>>>
>>>>>>> Mathieu
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <
>> je.karimov@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear community,
>>>>>>>>
>>>>>>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
>>>> [2].
>>>>>> You
>>>>>>>> can check the PR in [3].
>>>>>>>>
>>>>>>>> I would like to get your comments.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
>>>>>>>> [3] https://github.com/apache/kafka/pull/2466
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jeyhun
>>>>>>>> --
>>>>>>>> -Cheers
>>>>>>>>
>>>>>>>> Jeyhun
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>> -Cheers
>>>>>
>>>>> Jeyhun
>>>>
>>>> --
>>> -Cheers
>>>
>>> Jeyhun
>>
>> --
> -Cheers
> 
> Jeyhun
> 


Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Jeyhun Karimov <je...@gmail.com>.
Hi Eno,

Thanks for clarification. I think it is by definition allowed.  So if we
want to join a stream that uses wallclock time with a stream that uses an
event time, then we can assign the first one a timestamp extractor that
returns system clock, and for the second stream we can assign timestamp
extractor that extracts/computes the event time from record.

Cheers,
Jeyhun

On Tue, Feb 28, 2017 at 11:40 AM Eno Thereska <en...@gmail.com>
wrote:

> Hi Jeyhun,
>
> I mean something slightly different. In your motivation you say "joining
> multiple streams/tables that require different timestamp extraction
> methods". I wan to understand the scope of this. Is it allowed to have a
> stream that uses wallclock time join a stream that uses event time? (It
> would be good to give some examples in the motivation about scenarios you
> envision). If the join is not allowed, how do you prevent that join from
> happening? Do you throw an exception?
>
> Thanks
> Eno
>
>
> > On 28 Feb 2017, at 10:04, Jeyhun Karimov <je...@gmail.com> wrote:
> >
> > Hi Eno,
> >
> > Thanks for feedback. I think you mean [1]. In this KIP we do not consider
> > the situations you mentioned. So, either we can extend the KIP and solve
> > mentioned issues  or submit 2 PRs incrementally.
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-4785
> >
> >
> > Cheers,
> > Jeyhun
> >
> > On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <en...@gmail.com>
> > wrote:
> >
> >> Hi Jeyhun,
> >>
> >> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
> >>
> >> One thing I'd like to understand is whether we can avoid situations
> where
> >> the user is mixing different times (event time vs. wallclock time) in
> their
> >> processing inadvertently. Before this KIP, all the relevant topics have
> one
> >> time stamp extractor so that issue does not come up.
> >>
> >> What will be the behavior if times mismatch, e.g., for joins?
> >>
> >> Thanks
> >> Eno
> >>
> >>> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com> wrote:
> >>>
> >>> Dear community,
> >>>
> >>> I would like to get further feedbacks on this KIP (if any).
> >>>
> >>> Cheers
> >>> Jeyhun
> >>>
> >>> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <matthias@confluent.io
> >
> >>> wrote:
> >>>
> >>>> Mathieu,
> >>>>
> >>>> I personally agree with your observation, and we have plans to submit
> a
> >>>> KIP like this. If you want to drive this discussion feel free to start
> >>>> the KIP by yourself!
> >>>>
> >>>> Having said that, for this KIP we might want to focus the discussion
> the
> >>>> the actual feature that gets added: allowing to specify different
> >>>> TS-Extractor for different inputs.
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> >>>>> Hi Jeyhun,
> >>>>>
> >>>>> This KIP might not be the appropriate time, but my first thought
> >> reading
> >>>> it
> >>>>> is that it might make sense to introduce a builder-style API rather
> >> than
> >>>>> adding a mix of new method overloads with independent optional
> >>>> parameters.
> >>>>> :-)
> >>>>>
> >>>>> eg. stream(), table(), globalTable(), addSource(), could all accept a
> >>>>> "TopicReference" parameter that can be built like:
> >>>>>
> >>>>
> >>
> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
> >>>>>
> >>>>> Mathieu
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <
> je.karimov@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Dear community,
> >>>>>>
> >>>>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
> >> [2].
> >>>> You
> >>>>>> can check the PR in [3].
> >>>>>>
> >>>>>> I would like to get your comments.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
> >>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> >>>>>> [3] https://github.com/apache/kafka/pull/2466
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Jeyhun
> >>>>>> --
> >>>>>> -Cheers
> >>>>>>
> >>>>>> Jeyhun
> >>>>>>
> >>>>>
> >>>>
> >>>> --
> >>> -Cheers
> >>>
> >>> Jeyhun
> >>
> >> --
> > -Cheers
> >
> > Jeyhun
>
> --
-Cheers

Jeyhun

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Eno Thereska <en...@gmail.com>.
Hi Jeyhun,

I mean something slightly different. In your motivation you say "joining multiple streams/tables that require different timestamp extraction methods". I wan to understand the scope of this. Is it allowed to have a stream that uses wallclock time join a stream that uses event time? (It would be good to give some examples in the motivation about scenarios you envision). If the join is not allowed, how do you prevent that join from happening? Do you throw an exception?

Thanks
Eno


> On 28 Feb 2017, at 10:04, Jeyhun Karimov <je...@gmail.com> wrote:
> 
> Hi Eno,
> 
> Thanks for feedback. I think you mean [1]. In this KIP we do not consider
> the situations you mentioned. So, either we can extend the KIP and solve
> mentioned issues  or submit 2 PRs incrementally.
> 
> [1] https://issues.apache.org/jira/browse/KAFKA-4785
> 
> 
> Cheers,
> Jeyhun
> 
> On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <en...@gmail.com>
> wrote:
> 
>> Hi Jeyhun,
>> 
>> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
>> 
>> One thing I'd like to understand is whether we can avoid situations where
>> the user is mixing different times (event time vs. wallclock time) in their
>> processing inadvertently. Before this KIP, all the relevant topics have one
>> time stamp extractor so that issue does not come up.
>> 
>> What will be the behavior if times mismatch, e.g., for joins?
>> 
>> Thanks
>> Eno
>> 
>>> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com> wrote:
>>> 
>>> Dear community,
>>> 
>>> I would like to get further feedbacks on this KIP (if any).
>>> 
>>> Cheers
>>> Jeyhun
>>> 
>>> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>> 
>>>> Mathieu,
>>>> 
>>>> I personally agree with your observation, and we have plans to submit a
>>>> KIP like this. If you want to drive this discussion feel free to start
>>>> the KIP by yourself!
>>>> 
>>>> Having said that, for this KIP we might want to focus the discussion the
>>>> the actual feature that gets added: allowing to specify different
>>>> TS-Extractor for different inputs.
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
>>>>> Hi Jeyhun,
>>>>> 
>>>>> This KIP might not be the appropriate time, but my first thought
>> reading
>>>> it
>>>>> is that it might make sense to introduce a builder-style API rather
>> than
>>>>> adding a mix of new method overloads with independent optional
>>>> parameters.
>>>>> :-)
>>>>> 
>>>>> eg. stream(), table(), globalTable(), addSource(), could all accept a
>>>>> "TopicReference" parameter that can be built like:
>>>>> 
>>>> 
>> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
>>>>> 
>>>>> Mathieu
>>>>> 
>>>>> 
>>>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Dear community,
>>>>>> 
>>>>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
>> [2].
>>>> You
>>>>>> can check the PR in [3].
>>>>>> 
>>>>>> I would like to get your comments.
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
>>>>>> [3] https://github.com/apache/kafka/pull/2466
>>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> Jeyhun
>>>>>> --
>>>>>> -Cheers
>>>>>> 
>>>>>> Jeyhun
>>>>>> 
>>>>> 
>>>> 
>>>> --
>>> -Cheers
>>> 
>>> Jeyhun
>> 
>> --
> -Cheers
> 
> Jeyhun


Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Jeyhun Karimov <je...@gmail.com>.
Hi Eno,

Thanks for feedback. I think you mean [1]. In this KIP we do not consider
the situations you mentioned. So, either we can extend the KIP and solve
mentioned issues  or submit 2 PRs incrementally.

[1] https://issues.apache.org/jira/browse/KAFKA-4785


Cheers,
Jeyhun

On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <en...@gmail.com>
wrote:

> Hi Jeyhun,
>
> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
>
> One thing I'd like to understand is whether we can avoid situations where
> the user is mixing different times (event time vs. wallclock time) in their
> processing inadvertently. Before this KIP, all the relevant topics have one
> time stamp extractor so that issue does not come up.
>
> What will be the behavior if times mismatch, e.g., for joins?
>
> Thanks
> Eno
>
> > On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com> wrote:
> >
> > Dear community,
> >
> > I would like to get further feedbacks on this KIP (if any).
> >
> > Cheers
> > Jeyhun
> >
> > On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> Mathieu,
> >>
> >> I personally agree with your observation, and we have plans to submit a
> >> KIP like this. If you want to drive this discussion feel free to start
> >> the KIP by yourself!
> >>
> >> Having said that, for this KIP we might want to focus the discussion the
> >> the actual feature that gets added: allowing to specify different
> >> TS-Extractor for different inputs.
> >>
> >>
> >>
> >> -Matthias
> >>
> >> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> >>> Hi Jeyhun,
> >>>
> >>> This KIP might not be the appropriate time, but my first thought
> reading
> >> it
> >>> is that it might make sense to introduce a builder-style API rather
> than
> >>> adding a mix of new method overloads with independent optional
> >> parameters.
> >>> :-)
> >>>
> >>> eg. stream(), table(), globalTable(), addSource(), could all accept a
> >>> "TopicReference" parameter that can be built like:
> >>>
> >>
> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
> >>>
> >>> Mathieu
> >>>
> >>>
> >>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
> >>> wrote:
> >>>
> >>>> Dear community,
> >>>>
> >>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144
> [2].
> >> You
> >>>> can check the PR in [3].
> >>>>
> >>>> I would like to get your comments.
> >>>>
> >>>> [1]
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
> >>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> >>>> [3] https://github.com/apache/kafka/pull/2466
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Jeyhun
> >>>> --
> >>>> -Cheers
> >>>>
> >>>> Jeyhun
> >>>>
> >>>
> >>
> >> --
> > -Cheers
> >
> > Jeyhun
>
> --
-Cheers

Jeyhun

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Eno Thereska <en...@gmail.com>.
Hi Jeyhun,

Thanks for the KIP, sorry I'm coming a bit late to the discussion.

One thing I'd like to understand is whether we can avoid situations where the user is mixing different times (event time vs. wallclock time) in their processing inadvertently. Before this KIP, all the relevant topics have one time stamp extractor so that issue does not come up.

What will be the behavior if times mismatch, e.g., for joins?

Thanks
Eno

> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com> wrote:
> 
> Dear community,
> 
> I would like to get further feedbacks on this KIP (if any).
> 
> Cheers
> Jeyhun
> 
> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Mathieu,
>> 
>> I personally agree with your observation, and we have plans to submit a
>> KIP like this. If you want to drive this discussion feel free to start
>> the KIP by yourself!
>> 
>> Having said that, for this KIP we might want to focus the discussion the
>> the actual feature that gets added: allowing to specify different
>> TS-Extractor for different inputs.
>> 
>> 
>> 
>> -Matthias
>> 
>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
>>> Hi Jeyhun,
>>> 
>>> This KIP might not be the appropriate time, but my first thought reading
>> it
>>> is that it might make sense to introduce a builder-style API rather than
>>> adding a mix of new method overloads with independent optional
>> parameters.
>>> :-)
>>> 
>>> eg. stream(), table(), globalTable(), addSource(), could all accept a
>>> "TopicReference" parameter that can be built like:
>>> 
>> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
>>> 
>>> Mathieu
>>> 
>>> 
>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
>>> wrote:
>>> 
>>>> Dear community,
>>>> 
>>>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2].
>> You
>>>> can check the PR in [3].
>>>> 
>>>> I would like to get your comments.
>>>> 
>>>> [1]
>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
>>>> [3] https://github.com/apache/kafka/pull/2466
>>>> 
>>>> 
>>>> Cheers,
>>>> Jeyhun
>>>> --
>>>> -Cheers
>>>> 
>>>> Jeyhun
>>>> 
>>> 
>> 
>> --
> -Cheers
> 
> Jeyhun


Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Jeyhun Karimov <je...@gmail.com>.
Dear community,

I would like to get further feedbacks on this KIP (if any).

Cheers
Jeyhun

On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <ma...@confluent.io>
wrote:

> Mathieu,
>
> I personally agree with your observation, and we have plans to submit a
> KIP like this. If you want to drive this discussion feel free to start
> the KIP by yourself!
>
> Having said that, for this KIP we might want to focus the discussion the
> the actual feature that gets added: allowing to specify different
> TS-Extractor for different inputs.
>
>
>
> -Matthias
>
> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> > Hi Jeyhun,
> >
> > This KIP might not be the appropriate time, but my first thought reading
> it
> > is that it might make sense to introduce a builder-style API rather than
> > adding a mix of new method overloads with independent optional
> parameters.
> > :-)
> >
> > eg. stream(), table(), globalTable(), addSource(), could all accept a
> > "TopicReference" parameter that can be built like:
> >
> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
> >
> > Mathieu
> >
> >
> > On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
> > wrote:
> >
> >> Dear community,
> >>
> >> I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2].
> You
> >> can check the PR in [3].
> >>
> >> I would like to get your comments.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
> >> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> >> [3] https://github.com/apache/kafka/pull/2466
> >>
> >>
> >> Cheers,
> >> Jeyhun
> >> --
> >> -Cheers
> >>
> >> Jeyhun
> >>
> >
>
> --
-Cheers

Jeyhun

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Mathieu,

I personally agree with your observation, and we have plans to submit a
KIP like this. If you want to drive this discussion feel free to start
the KIP by yourself!

Having said that, for this KIP we might want to focus the discussion the
the actual feature that gets added: allowing to specify different
TS-Extractor for different inputs.



-Matthias

On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> Hi Jeyhun,
> 
> This KIP might not be the appropriate time, but my first thought reading it
> is that it might make sense to introduce a builder-style API rather than
> adding a mix of new method overloads with independent optional parameters.
> :-)
> 
> eg. stream(), table(), globalTable(), addSource(), could all accept a
> "TopicReference" parameter that can be built like:
> TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().
> 
> Mathieu
> 
> 
> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
> wrote:
> 
>> Dear community,
>>
>> I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2]. You
>> can check the PR in [3].
>>
>> I would like to get your comments.
>>
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
>> [3] https://github.com/apache/kafka/pull/2466
>>
>>
>> Cheers,
>> Jeyhun
>> --
>> -Cheers
>>
>> Jeyhun
>>
> 


Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Posted by Mathieu Fenniak <ma...@replicon.com>.
Hi Jeyhun,

This KIP might not be the appropriate time, but my first thought reading it
is that it might make sense to introduce a builder-style API rather than
adding a mix of new method overloads with independent optional parameters.
:-)

eg. stream(), table(), globalTable(), addSource(), could all accept a
"TopicReference" parameter that can be built like:
TopicReference("my-topic").keySerde(...).valueSerde(...).autoOffsetReset(...).timestampExtractor(...).build().

Mathieu


On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <je...@gmail.com>
wrote:

> Dear community,
>
> I want to share the KIP-123 [1] which is based on issue KAFKA-4144 [2]. You
> can check the PR in [3].
>
> I would like to get your comments.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68714788
> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> [3] https://github.com/apache/kafka/pull/2466
>
>
> Cheers,
> Jeyhun
> --
> -Cheers
>
> Jeyhun
>