You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2017/05/05 23:58:36 UTC

Re: [DISCUSS] KIP-123: Allow per stream/table timestamp extractor

Hi Ewen,

Personally I'm thinking to remove deprecated APIs after one minor release,
so yes, 0.12.0.0 would be the preferred timeline.


Guozhang

On Thu, Apr 6, 2017 at 8:40 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> I agree there's a potential issue that Eno raises, but probably it's not
> worth worrying about trying to enforce it. It seems like we have the right
> default for long term and for most cases a custom timestamp extractor
> should only be needed for legacy data that isn't putting a timestamp in the
> record. I think that if the user chooses to customize the timestamp
> extractor, it's reasonable to expect them to read enough docs to understand
> the implications. (By the way, this is also, I think a case where fewer
> overloads are fine because it should be unusual and a relatively advanced
> use case, so expecting the user to fill in some extra parameters even if
> they are using the default values is ok in order to keep the API tighter.
> However, if we're heading towards a builder API that won't matter anyway.)
>
> One other question I have is what the planned removal timeline is for the
> deprecated configs -- the KIP just says they will be deprecated but not
> when they will be removed. Specifying this is important for users so they
> can plan how aggressively they need to switch. Generally removal is tied to
> major releases, so I assume the thought is to remove in 0.12.0.0?
>
> -Ewen
>
> On Tue, Feb 28, 2017 at 9:21 AM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Eno,
> >
> > I think this problem is out-of-scope and also present in the current
> > setting. We cannot avoid that a custom timestamp extractor uses and
> > if-else branch and returns different timestamps for different topic.
> > That is possible even right now.
> >
> > Furthermore, the TimestampExtractor interface states:
> >
> > > The extracted timestamp MUST represent the milliseconds since midnight,
> > January 1, 1970 UTC.
> >
> > If uses don't follow this, there is nothing we can do about it.
> >
> >
> > -Matthias
> >
> >
> > On 2/28/17 7:47 AM, Jeyhun Karimov wrote:
> > > Hi Eno,
> > >
> > > Thanks for clarification. I think it is by definition allowed.  So if
> we
> > > want to join a stream that uses wallclock time with a stream that uses
> an
> > > event time, then we can assign the first one a timestamp extractor that
> > > returns system clock, and for the second stream we can assign timestamp
> > > extractor that extracts/computes the event time from record.
> > >
> > > Cheers,
> > > Jeyhun
> > >
> > > On Tue, Feb 28, 2017 at 11:40 AM Eno Thereska <en...@gmail.com>
> > > wrote:
> > >
> > >> Hi Jeyhun,
> > >>
> > >> I mean something slightly different. In your motivation you say
> "joining
> > >> multiple streams/tables that require different timestamp extraction
> > >> methods". I wan to understand the scope of this. Is it allowed to
> have a
> > >> stream that uses wallclock time join a stream that uses event time?
> (It
> > >> would be good to give some examples in the motivation about scenarios
> > you
> > >> envision). If the join is not allowed, how do you prevent that join
> from
> > >> happening? Do you throw an exception?
> > >>
> > >> Thanks
> > >> Eno
> > >>
> > >>
> > >>> On 28 Feb 2017, at 10:04, Jeyhun Karimov <je...@gmail.com>
> wrote:
> > >>>
> > >>> Hi Eno,
> > >>>
> > >>> Thanks for feedback. I think you mean [1]. In this KIP we do not
> > consider
> > >>> the situations you mentioned. So, either we can extend the KIP and
> > solve
> > >>> mentioned issues  or submit 2 PRs incrementally.
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-4785
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Jeyhun
> > >>>
> > >>> On Tue, Feb 28, 2017 at 10:41 AM Eno Thereska <
> eno.thereska@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi Jeyhun,
> > >>>>
> > >>>> Thanks for the KIP, sorry I'm coming a bit late to the discussion.
> > >>>>
> > >>>> One thing I'd like to understand is whether we can avoid situations
> > >> where
> > >>>> the user is mixing different times (event time vs. wallclock time)
> in
> > >> their
> > >>>> processing inadvertently. Before this KIP, all the relevant topics
> > have
> > >> one
> > >>>> time stamp extractor so that issue does not come up.
> > >>>>
> > >>>> What will be the behavior if times mismatch, e.g., for joins?
> > >>>>
> > >>>> Thanks
> > >>>> Eno
> > >>>>
> > >>>>> On 22 Feb 2017, at 09:21, Jeyhun Karimov <je...@gmail.com>
> > wrote:
> > >>>>>
> > >>>>> Dear community,
> > >>>>>
> > >>>>> I would like to get further feedbacks on this KIP (if any).
> > >>>>>
> > >>>>> Cheers
> > >>>>> Jeyhun
> > >>>>>
> > >>>>> On Wed, Feb 15, 2017 at 2:36 AM Matthias J. Sax <
> > matthias@confluent.io
> > >>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Mathieu,
> > >>>>>>
> > >>>>>> I personally agree with your observation, and we have plans to
> > submit
> > >> a
> > >>>>>> KIP like this. If you want to drive this discussion feel free to
> > start
> > >>>>>> the KIP by yourself!
> > >>>>>>
> > >>>>>> Having said that, for this KIP we might want to focus the
> discussion
> > >> the
> > >>>>>> the actual feature that gets added: allowing to specify different
> > >>>>>> TS-Extractor for different inputs.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>> On 2/14/17 4:54 PM, Mathieu Fenniak wrote:
> > >>>>>>> Hi Jeyhun,
> > >>>>>>>
> > >>>>>>> This KIP might not be the appropriate time, but my first thought
> > >>>> reading
> > >>>>>> it
> > >>>>>>> is that it might make sense to introduce a builder-style API
> rather
> > >>>> than
> > >>>>>>> adding a mix of new method overloads with independent optional
> > >>>>>> parameters.
> > >>>>>>> :-)
> > >>>>>>>
> > >>>>>>> eg. stream(), table(), globalTable(), addSource(), could all
> > accept a
> > >>>>>>> "TopicReference" parameter that can be built like:
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >> TopicReference("my-topic").keySerde(...).valueSerde(...).
> > autoOffsetReset(...).timestampExtractor(...).build().
> > >>>>>>>
> > >>>>>>> Mathieu
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Feb 14, 2017 at 5:31 PM, Jeyhun Karimov <
> > >> je.karimov@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Dear community,
> > >>>>>>>>
> > >>>>>>>> I want to share the KIP-123 [1] which is based on issue
> KAFKA-4144
> > >>>> [2].
> > >>>>>> You
> > >>>>>>>> can check the PR in [3].
> > >>>>>>>>
> > >>>>>>>> I would like to get your comments.
> > >>>>>>>>
> > >>>>>>>> [1]
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >> https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=68714788
> > >>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-4144
> > >>>>>>>> [3] https://github.com/apache/kafka/pull/2466
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Cheers,
> > >>>>>>>> Jeyhun
> > >>>>>>>> --
> > >>>>>>>> -Cheers
> > >>>>>>>>
> > >>>>>>>> Jeyhun
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>> -Cheers
> > >>>>>
> > >>>>> Jeyhun
> > >>>>
> > >>>> --
> > >>> -Cheers
> > >>>
> > >>> Jeyhun
> > >>
> > >> --
> > > -Cheers
> > >
> > > Jeyhun
> > >
> >
> >
>



-- 
-- Guozhang