You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jorge Esteban Quilcate Otoya <qu...@gmail.com> on 2023/01/26 15:35:20 UTC
Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

No worries, thanks Chris!

I think most feedback has been covered and the KIP is ready for vote. Will
be starting the vote thread soon.

Cheers,
Jorge.

On Mon, 5 Dec 2022 at 15:10, Chris Egerton <ch...@aiven.io.invalid> wrote:

> Hi Jorge,
>
> Thanks for indulging my paranoia. LGTM!
>
> Cheers,
>
> Chris
>
> On Mon, Dec 5, 2022 at 10:06 AM Jorge Esteban Quilcate Otoya <
> quilcate.jorge@gmail.com> wrote:
>
> > Sure! I have a added the following to the proposed changes section:
> >
> > ```
> > The per-record metrics will definitely be added to Kafka Connect as part
> of
> > this KIP, but their metric level will be changed pending the performance
> > testing described in KAFKA-14441, and will otherwise only be exposed at
> > lower level (DEBUG instead of INFO, and TRACE instead of DEBUG)
> > ```
> >
> > Let me know if how does it look.
> >
> > Many thanks!
> > Jorge.
> >
> > On Mon, 5 Dec 2022 at 14:11, Chris Egerton <ch...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Thanks for filing KAFKA-14441! In the ticket description we mention
> that
> > > "there will be more confidence whether to design metrics to be exposed
> > at a
> > > DEBUG or INFO level depending on their impact" but it doesn't seem like
> > > this is called out in the KIP and, just based on what's in the KIP, the
> > > proposal is still to have several per-record metrics exposed at INFO
> > level.
> > >
> > > Could we explicitly call out that the per-record metrics will
> definitely
> > be
> > > added to Kafka Connect as part of this KIP, but they will only be
> exposed
> > > at INFO level pending pending the performance testing described in
> > > KAFKA-14441, and will otherwise only be exposed at DEBUG level?
> > Otherwise,
> > > it's possible that a vote for the KIP as it's written today would be a
> > vote
> > > in favor of unconditionally exposing these metrics at INFO level, even
> if
> > > the performance testing reveals issues.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Sun, Dec 4, 2022 at 7:08 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jorge@gmail.com> wrote:
> > >
> > > > Thanks for the reminder Chris!
> > > >
> > > > I have added a note on the KIP to include this as part of the KIP as
> > most
> > > > of the metrics proposed are per-record and having all on DEBUG would
> > > limit
> > > > the benefits, and created
> > > > https://issues.apache.org/jira/browse/KAFKA-14441
> > > > to keep track of this task.
> > > >
> > > > Cheers,
> > > > Jorge.
> > > >
> > > > On Tue, 29 Nov 2022 at 19:40, Chris Egerton <chrise@aiven.io.invalid
> >
> > > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks! What were your thoughts on the possible benchmarking and/or
> > > > > downgrading of per-record metrics to DEBUG?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Thu, Nov 24, 2022 at 8:20 AM Jorge Esteban Quilcate Otoya <
> > > > > quilcate.jorge@gmail.com> wrote:
> > > > >
> > > > > > Thanks Chris! I have updated the KIP with "transform" instead of
> > > > "alias".
> > > > > > Agree it's clearer.
> > > > > >
> > > > > > Cheers,
> > > > > > Jorge.
> > > > > >
> > > > > > On Mon, 21 Nov 2022 at 21:36, Chris Egerton
> > <chrise@aiven.io.invalid
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jorge,
> > > > > > >
> > > > > > > Thanks for the updates, and apologies for the delay. The new
> > > diagram
> > > > > > > directly under the "Proposed Changes" section is absolutely
> > > gorgeous!
> > > > > > >
> > > > > > >
> > > > > > > Follow-ups:
> > > > > > >
> > > > > > > RE 2: Good point. We can use the same level for these metrics,
> > it's
> > > > > not a
> > > > > > > big deal.
> > > > > > >
> > > > > > > RE 3: As long as all the per-record metrics are kept at DEBUG
> > > level,
> > > > it
> > > > > > > should be fine to leave JMH benchmarking for a follow-up. If we
> > > want
> > > > to
> > > > > > add
> > > > > > > new per-record, INFO-level metrics, I would be more comfortable
> > > with
> > > > > > > including benchmarking as part of the testing plan for the KIP.
> > One
> > > > > > > possible compromise could be to propose that these features be
> > > merged
> > > > > at
> > > > > > > DEBUG level, and then possibly upgraded to INFO level in the
> > future
> > > > > > pending
> > > > > > > benchmarks to guard against performance degradation.
> > > > > > >
> > > > > > > RE 4: I think for a true "end-to-end" metric, it'd be useful to
> > > > include
> > > > > > the
> > > > > > > time taken by the task to actually deliver the record. However,
> > > with
> > > > > the
> > > > > > > new metric names and descriptions provided in the KIP, I have
> no
> > > > > > objections
> > > > > > > with what's currently proposed, and a new "end-to-end" metric
> can
> > > be
> > > > > > taken
> > > > > > > on later in a follow-up KIP.
> > > > > > >
> > > > > > > RE 6: You're right, existing producer metrics should be enough
> > for
> > > > now.
> > > > > > We
> > > > > > > can revisit this later if/when we add delivery-centric metrics
> > for
> > > > sink
> > > > > > > tasks as well.
> > > > > > >
> > > > > > > RE 7: The new metric names in the KIP LGTM; I don't see any
> need
> > to
> > > > > > expand
> > > > > > > beyond those but if you'd still like to pursue others, LMK.
> > > > > > >
> > > > > > >
> > > > > > > New thoughts:
> > > > > > >
> > > > > > > One small thought: instead of "alias" in
> > "alias="{transform_alias}"
> > > > for
> > > > > > the
> > > > > > > per-transform metrics, could we use "transform"? IMO it's
> clearer
> > > > since
> > > > > > we
> > > > > > > don't use "alias" in the names of transform-related properties,
> > and
> > > > > > "alias"
> > > > > > > may be confused with the classloading term where you can use,
> > e.g.,
> > > > > > > "FileStreamSource" as the name of a connector class in a
> > connector
> > > > > config
> > > > > > > instead of
> > > "org.apache.kafka.connect.file.FileStreamSourceConnector".
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Chris
> > > > > > >
> > > > > > > On Fri, Nov 18, 2022 at 12:06 PM Jorge Esteban Quilcate Otoya <
> > > > > > > quilcate.jorge@gmail.com> wrote:
> > > > > > >
> > > > > > > > Thanks Mickael!
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, 9 Nov 2022 at 15:54, Mickael Maison <
> > > > > mickael.maison@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jorge,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP, it is a nice improvement.
> > > > > > > > >
> > > > > > > > > 1) The per transformation metrics still have a question
> mark
> > > next
> > > > > to
> > > > > > > > > them in the KIP. Do you want to include them? If so we'll
> > want
> > > to
> > > > > tag
> > > > > > > > > them, we should be able to include the aliases in
> > > > > TransformationChain
> > > > > > > > > and use them.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, I have added the changes on TransformChain that will be
> > > needed
> > > > > to
> > > > > > > add
> > > > > > > > these metrics.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2) I see no references to predicates. If we don't want to
> > > measure
> > > > > > > > > their latency, can we say it explicitly?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Good question, I haven't considered these. Though as these
> are
> > > > > > > materialized
> > > > > > > > as PredicatedTransformation, they should be covered by these
> > > > changes.
> > > > > > > > Adding a note about this.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 3) Should we have sink-record-batch-latency-avg-ms? All
> other
> > > > > metrics
> > > > > > > > > have both the maximum and average values.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Good question. I will remove it and change the record latency
> > > from
> > > > > > > > DEBUG->INFO as it already cover the maximum metric.
> > > > > > > >
> > > > > > > > Hope it's clearer now, let me know if there any additional
> > > > feedback.
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Mickael
> > > > > > > > >
> > > > > > > > > On Thu, Oct 20, 2022 at 9:58 PM Jorge Esteban Quilcate
> Otoya
> > > > > > > > > <qu...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks, Chris! Great feedback! Please, find my comments
> > > below:
> > > > > > > > > >
> > > > > > > > > > On Thu, 13 Oct 2022 at 18:52, Chris Egerton
> > > > > > <chrise@aiven.io.invalid
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Jorge,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP. I agree with the overall direction
> > and
> > > > > think
> > > > > > > this
> > > > > > > > > would
> > > > > > > > > > > be a nice improvement to Kafka Connect. Here are my
> > initial
> > > > > > > thoughts
> > > > > > > > > on the
> > > > > > > > > > > details:
> > > > > > > > > > >
> > > > > > > > > > > 1. The motivation section outlines the gaps in Kafka
> > > > Connect's
> > > > > > task
> > > > > > > > > metrics
> > > > > > > > > > > nicely. I think it'd be useful to include more concrete
> > > > details
> > > > > > on
> > > > > > > > why
> > > > > > > > > > > these gaps need to be filled in, and in which cases
> > > > additional
> > > > > > > > metrics
> > > > > > > > > > > would be helpful. One goal could be to provide enhanced
> > > > > > monitoring
> > > > > > > of
> > > > > > > > > > > production deployments that allows for cluster
> > > administrators
> > > > > to
> > > > > > > set
> > > > > > > > up
> > > > > > > > > > > automatic alerts for latency spikes and, if triggered,
> > > > quickly
> > > > > > > > > identify the
> > > > > > > > > > > root cause of those alerts, reducing the time to
> > > remediation.
> > > > > > > Another
> > > > > > > > > goal
> > > > > > > > > > > could be to provide more insight to developers or
> cluster
> > > > > > > > > administrators
> > > > > > > > > > > who want to do performance testing on connectors in
> > > > > > non-production
> > > > > > > > > > > environments. It may help guide our decision making
> > process
> > > > to
> > > > > > > have a
> > > > > > > > > > > clearer picture of the goals we're trying to achieve.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Agree. The Motivation section has been updated.
> > > > > > > > > > Thanks for the examples, I see both of them being covered
> > by
> > > > the
> > > > > > KIP.
> > > > > > > > > > I see how these could give us a good distinction on
> whether
> > > to
> > > > > > > position
> > > > > > > > > > some metrics at INFO or DEBUG level.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 2. If we're trying to address the alert-and-diagnose
> use
> > > > case,
> > > > > > it'd
> > > > > > > > be
> > > > > > > > > > > useful to have as much information as possible at INFO
> > > level,
> > > > > > > rather
> > > > > > > > > than
> > > > > > > > > > > forcing cluster administrators to possibly reconfigure
> a
> > > > > > connector
> > > > > > > to
> > > > > > > > > emit
> > > > > > > > > > > DEBUG or TRACE level metrics in order to diagnose a
> > > potential
> > > > > > > > > > > production-impacting performance bottleneck. I can see
> > the
> > > > > > > rationale
> > > > > > > > > for
> > > > > > > > > > > emitting per-record metrics that track an average value
> > at
> > > > > DEBUG
> > > > > > > > > level, but
> > > > > > > > > > > for per-record metrics that track a maximum value, is
> > there
> > > > any
> > > > > > > > reason
> > > > > > > > > not
> > > > > > > > > > > to provide this information at INFO level?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Agree. Though with Max and Avg metrics being part of the
> > same
> > > > > > sensor
> > > > > > > —
> > > > > > > > > > where Metric Level is defined — then both metrics get the
> > > same
> > > > > > level.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 3. I'm also curious about the performance testing
> > suggested
> > > > by
> > > > > > Yash
> > > > > > > > to
> > > > > > > > > > > gauge the potential impact of this change. Have you
> been
> > > able
> > > > > to
> > > > > > do
> > > > > > > > any
> > > > > > > > > > > testing with your draft implementation yet?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > No, not so far.
> > > > > > > > > > I think it would be valuable to discuss the scope of this
> > > > testing
> > > > > > and
> > > > > > > > > maybe
> > > > > > > > > > tackle it
> > > > > > > > > > in a separate issue as Sensors and Metrics are used all
> > over
> > > > the
> > > > > > > place.
> > > > > > > > > > My initial understanding is that these tests should by
> > placed
> > > > in
> > > > > > the
> > > > > > > > > > jmh-benchmarks[1].
> > > > > > > > > > Then, we could target testing Sensors and Metrics, and
> > > validate
> > > > > how
> > > > > > > > much
> > > > > > > > > > overhead
> > > > > > > > > > is added by having only Max vs Max,Avg(,Min), etc.
> > > > > > > > > > In the other hand, we could extend this to Transformers
> or
> > > > other
> > > > > > > > Connect
> > > > > > > > > > layers.
> > > > > > > > > >
> > > > > > > > > > Here are some pointers to the Sensors and Metrics
> > > > implementations
> > > > > > > that
> > > > > > > > > > could be considered:
> > > > > > > > > > Path to metric recording:
> > > > > > > > > > -
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L195-L199
> > > > > > > > > > -
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/5cab11cf525f6c06fcf9eb43f7f95ef33fe1cdbb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L230-L244
> > > > > > > > > >
> > > > > > > > > > ```
> > > > > > > > > > // increment all the stats
> > > > > > > > > > for (StatAndConfig statAndConfig : this.stats) {
> > > > > > > > > >    statAndConfig.stat.record(statAndConfig.config(),
> value,
> > > > > > timeMs);
> > > > > > > > > > }
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > SampledStats:
> > > > > > > > > > - Avg:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Avg.java
> > > > > > > > > > - Max:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Max.java
> > > > > > > > > > - Min:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/068ab9cefae301f3187ea885d645c425955e77d2/clients/src/main/java/org/apache/kafka/common/metrics/stats/Min.java
> > > > > > > > > >
> > > > > > > > > > `stat#record()` are implemented by `update` method in
> > > > > SampledStat:
> > > > > > > > > >
> > > > > > > > > > ```Max.java
> > > > > > > > > >     @Override
> > > > > > > > > >     protected void update(Sample sample, MetricConfig
> > config,
> > > > > > double
> > > > > > > > > value,
> > > > > > > > > > long now) {
> > > > > > > > > >         sample.value += value;
> > > > > > > > > >     }
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > ```Avg.java
> > > > > > > > > >     @Override
> > > > > > > > > >     protected void update(Sample sample, MetricConfig
> > config,
> > > > > > double
> > > > > > > > > value,
> > > > > > > > > > long now) {
> > > > > > > > > >         sample.value = Math.max(sample.value, value);
> > > > > > > > > >     }
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > As far as I understand, most of the work of the stats
> > happens
> > > > on
> > > > > > the
> > > > > > > > > > `combine` method that is not part of the connector
> > execution
> > > > but
> > > > > > > called
> > > > > > > > > > when metrics are queried.
> > > > > > > > > >
> > > > > > > > > > I wonder whether we should consider Avg and Max for all
> > > metrics
> > > > > > > > proposed
> > > > > > > > > as
> > > > > > > > > > the impact on the execution path seems minimal, and even
> > see
> > > if
> > > > > Min
> > > > > > > is
> > > > > > > > > also
> > > > > > > > > > valuable, and use DEBUG only for more granular metrics.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > https://github.com/apache/kafka/tree/trunk/jmh-benchmarks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 4. Just to make sure I understand correctly--does "time
> > > when
> > > > it
> > > > > > has
> > > > > > > > > been
> > > > > > > > > > > received by the Sink task" refer to the wallclock time
> > > > directly
> > > > > > > > after a
> > > > > > > > > > > call to SinkTask::put has been completed (as opposed to
> > > > > directly
> > > > > > > > before
> > > > > > > > > > > that call is made, or something else entirely)?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > It currently means when it has been received by the Sink
> > task
> > > > > > > > > > right after consumer poll and before conversions.
> > > > > > > > > > Would it be valuable to have it after put-sink-records?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 5. If the goal is to identify performance bottlenecks
> > > (either
> > > > > in
> > > > > > > > > production
> > > > > > > > > > > or pre-production environments), would it make sense to
> > > > > introduce
> > > > > > > > > metrics
> > > > > > > > > > > for each individual converter (i.e., key/value/header)
> > and
> > > > > > > > > transformation?
> > > > > > > > > > > It's definitely an improvement to be able to identify
> the
> > > > total
> > > > > > > time
> > > > > > > > > for
> > > > > > > > > > > conversion and transformation, but then the immediate
> > > > follow-up
> > > > > > > > > question if
> > > > > > > > > > > a bottleneck is found in that phase is "which
> > > > > > > > converter/transformation
> > > > > > > > > is
> > > > > > > > > > > responsible?" It'd be nice if we could provide a way to
> > > > quickly
> > > > > > > > answer
> > > > > > > > > that
> > > > > > > > > > > question.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is a great idea. I'd like to consider this as well,
> > > though
> > > > > > maybe
> > > > > > > > > these
> > > > > > > > > > more granular
> > > > > > > > > > metrics would be good to have them as DEBUG.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 6. Any thoughts about offering latency metrics for
> source
> > > > tasks
> > > > > > > > between
> > > > > > > > > > > receipt of the record from the task and delivery of the
> > > > record
> > > > > to
> > > > > > > > Kafka
> > > > > > > > > > > (which would be tracked by producer callback)? We could
> > > also
> > > > > use
> > > > > > > the
> > > > > > > > > record
> > > > > > > > > > > timestamp either instead of or in addition to receipt
> > time
> > > if
> > > > > the
> > > > > > > > task
> > > > > > > > > > > provides a timestamp with its records.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > With source transform and convert metrics we get part of
> > that
> > > > > > > latency.
> > > > > > > > > > Looking at the Producer metrics, `request-latency`
> (though
> > a
> > > > very
> > > > > > > > generic
> > > > > > > > > > metric)
> > > > > > > > > > sort of answer the time between send request and ack — if
> > my
> > > > > > > > > understanding
> > > > > > > > > > is correct.
> > > > > > > > > > Would these be enough or you're thinking about another
> > > > approach?
> > > > > > > > > > maybe a custom metric to cover the producer side?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 7. We may end up introducing a way for sink tasks to
> > record
> > > > > > > > per-record
> > > > > > > > > > > delivery to the sink system (see KIP-767 [1]). I'd like
> > it
> > > if
> > > > > we
> > > > > > > > could
> > > > > > > > > keep
> > > > > > > > > > > the names of our metrics very precise in order to avoid
> > > > > confusing
> > > > > > > > users
> > > > > > > > > > > (who may think that we're providing metrics on actual
> > > > delivery
> > > > > to
> > > > > > > the
> > > > > > > > > sink
> > > > > > > > > > > system, which may not be the case if the connector
> > performs
> > > > > > > > > asynchronous
> > > > > > > > > > > writes), and in order to leave room for a metrics on
> true
> > > > > > delivery
> > > > > > > > > time by
> > > > > > > > > > > sink tasks. It'd also be nice if we could remain
> > consistent
> > > > > with
> > > > > > > > > existing
> > > > > > > > > > > metrics such as "put-batch-avg-time-ms". With that in
> > mind,
> > > > > what
> > > > > > do
> > > > > > > > you
> > > > > > > > > > > think about renaming these metrics:
> > > > > > > > > > > - "sink-record-batch-latency-max-ms" to
> > > > > > "put-batch-avg-latency-ms"
> > > > > > > > > > > - "sink-record-latency-max-ms" to
> > > > > > "put-sink-record-latency-max-ms"
> > > > > > > > > > > - "sink-record-latency-avg-ms" to
> > > > > > "put-sink-record-latency-avg-ms"
> > > > > > > > > > > - "sink-record-convert-transform-time-max-ms" to
> > > > > > > > > > > "convert-transform-sink-record-time-max-ms"
> > > > > > > > > > > - "sink-record-convert-transform-time-avg-ms" to
> > > > > > > > > > > "convert-transform-sink-record-time-avg-ms"
> > > > > > > > > > > - "source-record-transform-convert-time-max-ms" to
> > > > > > > > > > > "transform-convert-source-record-time-max-ms"
> > > > > > > > > > > - "source-record-transform-convert-time-avg-ms" to
> > > > > > > > > > > "transform-convert-source-record-time-avg-ms"
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Make sense, thanks! I have updated the list of metrics
> and
> > > > group
> > > > > > them
> > > > > > > > by
> > > > > > > > > > sensor and applying these suggestions.
> > > > > > > > > > The only ones that I want to review are: sink-record-* to
> > > > > > put-batch-*
> > > > > > > > > > (first 3). Not sure if put-batch/put-sink-record
> describes
> > > the
> > > > > > > purpose
> > > > > > > > of
> > > > > > > > > > the metric — neither `sink-record-latency` to be honest.
> > > > > > > > > > My initial thought was to have something like Kafka
> Streams
> > > > > > > > e2e-latency.
> > > > > > > > > > Based on 4. and 6. questions, an idea could be to add:
> > > > > > > > > > - source-batch-e2e-latency-before-send: measure
> wallclock -
> > > > > source
> > > > > > > > record
> > > > > > > > > > timestamp after source connector poll.
> > > > > > > > > > - source-batch-e2e-latency-after-send: measure wallclock
> -
> > > > record
> > > > > > > > > timestamp
> > > > > > > > > > on producer send callback
> > > > > > > > > > - sink-batch-e2e-latency-before-put: measure time
> > wallclock -
> > > > > > record
> > > > > > > > > > timestamp after consumer poll
> > > > > > > > > > - sink-batch-e2e-latency-after-put: measure time
> wallclock
> > -
> > > > > record
> > > > > > > > > > timestamp after sink connector put.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Thanks again for the KIP! Looking forward to your
> > thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > >
> > > > > > > > > > > Chris
> > > > > > > > > > >
> > > > > > > > > > > [1] -
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 15, 2022 at 1:32 PM Jorge Esteban Quilcate
> > > Otoya
> > > > <
> > > > > > > > > > > quilcate.jorge@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > I've made a slight addition to the KIP based on Yash
> > > > > feedback:
> > > > > > > > > > > >
> > > > > > > > > > > > - A new metric is added at INFO level to record the
> max
> > > > > latency
> > > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > > batch timestamp, by keeping the oldest record
> timestamp
> > > per
> > > > > > > batch.
> > > > > > > > > > > > - A draft implementation is linked.
> > > > > > > > > > > >
> > > > > > > > > > > > Looking forward to your feedback.
> > > > > > > > > > > > Also, a kindly reminder that the vote thread is open.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks!
> > > > > > > > > > > > Jorge.
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 8 Sept 2022 at 14:25, Jorge Esteban Quilcate
> > > Otoya
> > > > <
> > > > > > > > > > > > quilcate.jorge@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Great. I have updated the KIP to reflect this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Jorge.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 8 Sept 2022 at 12:26, Yash Mayya <
> > > > > > yash.mayya@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Thanks, I think it makes sense to define these
> > metrics
> > > > at
> > > > > a
> > > > > > > > DEBUG
> > > > > > > > > > > > >> recording
> > > > > > > > > > > > >> level.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Thu, Sep 8, 2022 at 2:51 PM Jorge Esteban
> > Quilcate
> > > > > Otoya
> > > > > > <
> > > > > > > > > > > > >> quilcate.jorge@gmail.com> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > On Thu, 8 Sept 2022 at 05:55, Yash Mayya <
> > > > > > > > yash.mayya@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > Hi Jorge,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Thanks for the changes. With regard to having
> > per
> > > > > batch
> > > > > > vs
> > > > > > > > per
> > > > > > > > > > > > record
> > > > > > > > > > > > >> > > metrics, the additional overhead I was
> referring
> > > to
> > > > > > wasn't
> > > > > > > > > about
> > > > > > > > > > > > >> whether
> > > > > > > > > > > > >> > or
> > > > > > > > > > > > >> > > not we would need to iterate over all the
> > records
> > > > in a
> > > > > > > > batch.
> > > > > > > > > I
> > > > > > > > > > > was
> > > > > > > > > > > > >> > > referring to the potential additional overhead
> > > > caused
> > > > > by
> > > > > > > the
> > > > > > > > > > > higher
> > > > > > > > > > > > >> > volume
> > > > > > > > > > > > >> > > of calls to Sensor::record on the sensors for
> > the
> > > > new
> > > > > > > > metrics
> > > > > > > > > (as
> > > > > > > > > > > > >> > compared
> > > > > > > > > > > > >> > > to the existing batch only metrics),
> especially
> > > for
> > > > > high
> > > > > > > > > > > throughput
> > > > > > > > > > > > >> > > connectors where batch sizes could be large. I
> > > guess
> > > > > we
> > > > > > > may
> > > > > > > > > want
> > > > > > > > > > > to
> > > > > > > > > > > > do
> > > > > > > > > > > > >> > some
> > > > > > > > > > > > >> > > sort of performance testing and get concrete
> > > numbers
> > > > > to
> > > > > > > > verify
> > > > > > > > > > > > whether
> > > > > > > > > > > > >> > this
> > > > > > > > > > > > >> > > is a valid concern or not?
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > 6.1. Got it, thanks for clarifying. I guess
> there
> > > > could
> > > > > > be a
> > > > > > > > > > > benchmark
> > > > > > > > > > > > >> test
> > > > > > > > > > > > >> > of the `Sensor::record` to get an idea of the
> > > > > performance
> > > > > > > > > impact.
> > > > > > > > > > > > >> > Regardless, the fact that these are
> single-record
> > > > > metrics
> > > > > > > > > compared
> > > > > > > > > > > to
> > > > > > > > > > > > >> > existing batch-only could be explicitly defined
> by
> > > > > setting
> > > > > > > > these
> > > > > > > > > > > > >> metrics at
> > > > > > > > > > > > >> > a DEBUG or TRACE metric recording level, leaving
> > the
> > > > > > > existing
> > > > > > > > at
> > > > > > > > > > > INFO
> > > > > > > > > > > > >> > level.
> > > > > > > > > > > > >> > wdyt?
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Thanks,
> > > > > > > > > > > > >> > > Yash
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > On Tue, Sep 6, 2022 at 4:42 PM Jorge Esteban
> > > > Quilcate
> > > > > > > Otoya
> > > > > > > > <
> > > > > > > > > > > > >> > > quilcate.jorge@gmail.com> wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > Hi Sagar and Yash,
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > the way it's defined in
> > > > > > > > > > > > >> > > >
> > > > > > > > https://kafka.apache.org/documentation/#connect_monitoring
> > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > > metrics
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 4.1. Got it. Add it to the KIP.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > The only thing I would argue is do we need
> > > > > > > > > > > > >> sink-record-latency-min?
> > > > > > > > > > > > >> > > Maybe
> > > > > > > > > > > > >> > > > we
> > > > > > > > > > > > >> > > > > could remove this min metric as well and
> > make
> > > > all
> > > > > of
> > > > > > > the
> > > > > > > > > 3 e2e
> > > > > > > > > > > > >> > metrics
> > > > > > > > > > > > >> > > > > consistent
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 4.2 I see. Will remove it from the KIP.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > Probably users can track the metrics at
> > their
> > > > end
> > > > > to
> > > > > > > > > > > > >> > > > > figure that out. Do you think that makes
> > > sense?
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 4.3. Yes, agree. With these new metrics it
> > > should
> > > > be
> > > > > > > > easier
> > > > > > > > > for
> > > > > > > > > > > > >> users
> > > > > > > > > > > > >> > to
> > > > > > > > > > > > >> > > > track this.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > I think it makes sense to not have a min
> > > metric
> > > > > for
> > > > > > > > > either to
> > > > > > > > > > > > >> remain
> > > > > > > > > > > > >> > > > > consistent with the existing put-batch and
> > > > > > poll-batch
> > > > > > > > > metrics
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 5.1. Got it. Same as 4.2
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > Another naming related suggestion I had
> was
> > > with
> > > > > the
> > > > > > > > > > > > >> > > > > "convert-time" metrics - we should
> probably
> > > > > include
> > > > > > > > > > > > >> transformations
> > > > > > > > > > > > >> > in
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > > name since SMTs could definitely be
> > > attributable
> > > > > to
> > > > > > a
> > > > > > > > > sizable
> > > > > > > > > > > > >> chunk
> > > > > > > > > > > > >> > of
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > > latency depending on the specific
> > > transformation
> > > > > > > chain.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 5.2. Make sense. I'm proposing to add
> > > > > > > > > > > > >> > `sink-record-convert-transform...`
> > > > > > > > > > > > >> > > > and `source-record-transform-convert...` to
> > > > > represent
> > > > > > > > > correctly
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > order
> > > > > > > > > > > > >> > > > of operations.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > it seems like both source and sink tasks
> > only
> > > > > record
> > > > > > > > > metrics
> > > > > > > > > > > at
> > > > > > > > > > > > a
> > > > > > > > > > > > >> > > "batch"
> > > > > > > > > > > > >> > > > > level, not on an individual record level.
> I
> > > > think
> > > > > it
> > > > > > > > > might be
> > > > > > > > > > > > >> > > additional
> > > > > > > > > > > > >> > > > > overhead if we want to record these new
> > > metrics
> > > > > all
> > > > > > at
> > > > > > > > the
> > > > > > > > > > > > record
> > > > > > > > > > > > >> > > level?
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 5.3. I considered at the beginning to
> > implement
> > > > all
> > > > > > > > metrics
> > > > > > > > > at
> > > > > > > > > > > the
> > > > > > > > > > > > >> > batch
> > > > > > > > > > > > >> > > > level, but given how the framework process
> > > > records,
> > > > > I
> > > > > > > > > fallback
> > > > > > > > > > > to
> > > > > > > > > > > > >> the
> > > > > > > > > > > > >> > > > proposed approach:
> > > > > > > > > > > > >> > > > - Sink Task:
> > > > > > > > > > > > >> > > >   - `WorkerSinkTask#convertMessages(msgs)`
> > > already
> > > > > > > > iterates
> > > > > > > > > over
> > > > > > > > > > > > >> > records,
> > > > > > > > > > > > >> > > > so there is no additional overhead to
> capture
> > > > record
> > > > > > > > > latency per
> > > > > > > > > > > > >> > record.
> > > > > > > > > > > > >> > > >     -
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L490-L514
> > > > > > > > > > > > >> > > >   -
> > > > > `WorkerSinkTask#convertAndTransformRecord(record)`
> > > > > > > > > actually
> > > > > > > > > > > > >> happens
> > > > > > > > > > > > >> > > > individually. Measuring this operation per
> > batch
> > > > > would
> > > > > > > > > include
> > > > > > > > > > > > >> > processing
> > > > > > > > > > > > >> > > > that is not strictly part of "convert and
> > > > transform"
> > > > > > > > > > > > >> > > >     -
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L518
> > > > > > > > > > > > >> > > > - Source Task:
> > > > > > > > > > > > >> > > >   - `AbstractWorkerSourceTask#sendRecords`
> > > > iterates
> > > > > > > over a
> > > > > > > > > batch
> > > > > > > > > > > > and
> > > > > > > > > > > > >> > > > applies transforms and convert record
> > > individually
> > > > > as
> > > > > > > > well:
> > > > > > > > > > > > >> > > >     -
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/9841647c4fe422532f448423c92d26e4fdcb8932/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L389-L390
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > This might require some additional
> changes -
> > > > > > > > > > > > >> > > > > for instance, with the
> "sink-record-latency"
> > > > > metric,
> > > > > > > we
> > > > > > > > > might
> > > > > > > > > > > > only
> > > > > > > > > > > > >> > want
> > > > > > > > > > > > >> > > > to
> > > > > > > > > > > > >> > > > > have a "max" metric since "avg" would
> > require
> > > > > > > recording
> > > > > > > > a
> > > > > > > > > > > value
> > > > > > > > > > > > on
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > > sensor for each record (whereas we can
> get a
> > > > "max"
> > > > > > by
> > > > > > > > only
> > > > > > > > > > > > >> recording
> > > > > > > > > > > > >> > a
> > > > > > > > > > > > >> > > > > metric value for the oldest record in each
> > > > batch).
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 5.4. Recording record-latency per batch may
> > not
> > > be
> > > > > as
> > > > > > > > > useful as
> > > > > > > > > > > > >> there
> > > > > > > > > > > > >> > is
> > > > > > > > > > > > >> > > no
> > > > > > > > > > > > >> > > > guarantee that the oldest record will be
> > > > > > representative
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > >> batch.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > On Sat, 3 Sept 2022 at 16:02, Yash Mayya <
> > > > > > > > > yash.mayya@gmail.com>
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > Hi Jorge and Sagar,
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > I think it makes sense to not have a min
> > > metric
> > > > > for
> > > > > > > > > either to
> > > > > > > > > > > > >> remain
> > > > > > > > > > > > >> > > > > consistent with the existing put-batch and
> > > > > > poll-batch
> > > > > > > > > metrics
> > > > > > > > > > > > (it
> > > > > > > > > > > > >> > > doesn't
> > > > > > > > > > > > >> > > > > seem particularly useful either anyway).
> > Also,
> > > > the
> > > > > > new
> > > > > > > > > > > > >> > > > > "sink-record-latency" metric name looks
> fine
> > > to
> > > > > me,
> > > > > > > > > thanks for
> > > > > > > > > > > > >> making
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > > changes! Another naming related
> suggestion I
> > > had
> > > > > was
> > > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > >> > > > > "convert-time" metrics - we should
> probably
> > > > > include
> > > > > > > > > > > > >> transformations
> > > > > > > > > > > > >> > in
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > > name since SMTs could definitely be
> > > attributable
> > > > > to
> > > > > > a
> > > > > > > > > sizable
> > > > > > > > > > > > >> chunk
> > > > > > > > > > > > >> > of
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > > latency depending on the specific
> > > transformation
> > > > > > > chain.
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > I have one high level question with
> respect
> > to
> > > > > > > > > implementation
> > > > > > > > > > > -
> > > > > > > > > > > > >> > > > currently,
> > > > > > > > > > > > >> > > > > it seems like both source and sink tasks
> > only
> > > > > record
> > > > > > > > > metrics
> > > > > > > > > > > at
> > > > > > > > > > > > a
> > > > > > > > > > > > >> > > "batch"
> > > > > > > > > > > > >> > > > > level, not on an individual record level.
> I
> > > > think
> > > > > it
> > > > > > > > > might be
> > > > > > > > > > > > >> > > additional
> > > > > > > > > > > > >> > > > > overhead if we want to record these new
> > > metrics
> > > > > all
> > > > > > at
> > > > > > > > the
> > > > > > > > > > > > record
> > > > > > > > > > > > >> > > level?
> > > > > > > > > > > > >> > > > > Could we instead make all of these new
> > metrics
> > > > for
> > > > > > > > > batches of
> > > > > > > > > > > > >> records
> > > > > > > > > > > > >> > > > > rather than individual records in order to
> > > > remain
> > > > > > > > > consistent
> > > > > > > > > > > > with
> > > > > > > > > > > > >> the
> > > > > > > > > > > > >> > > > > existing task level metrics? This might
> > > require
> > > > > some
> > > > > > > > > > > additional
> > > > > > > > > > > > >> > > changes -
> > > > > > > > > > > > >> > > > > for instance, with the
> "sink-record-latency"
> > > > > metric,
> > > > > > > we
> > > > > > > > > might
> > > > > > > > > > > > only
> > > > > > > > > > > > >> > want
> > > > > > > > > > > > >> > > > to
> > > > > > > > > > > > >> > > > > have a "max" metric since "avg" would
> > require
> > > > > > > recording
> > > > > > > > a
> > > > > > > > > > > value
> > > > > > > > > > > > on
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > > sensor for each record (whereas we can
> get a
> > > > "max"
> > > > > > by
> > > > > > > > only
> > > > > > > > > > > > >> recording
> > > > > > > > > > > > >> > a
> > > > > > > > > > > > >> > > > > metric value for the oldest record in each
> > > > batch).
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Thanks,
> > > > > > > > > > > > >> > > > > Yash
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > On Fri, Sep 2, 2022 at 3:16 PM Sagar <
> > > > > > > > > > > sagarmeansocean@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > > Hi Jorge,
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > Thanks for the changes.
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > Regarding the metrics, I meant something
> > > like
> > > > > > this:
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > the way it's defined in
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > https://kafka.apache.org/documentation/#connect_monitoring
> > > > > > > > > > > > for
> > > > > > > > > > > > >> the
> > > > > > > > > > > > >> > > > > > metrics.
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > I see what you mean by the 3 metrics and
> > how
> > > > it
> > > > > > can
> > > > > > > be
> > > > > > > > > > > > >> interpreted.
> > > > > > > > > > > > >> > > The
> > > > > > > > > > > > >> > > > > > only thing I would argue is do we need
> > > > > > > > > > > > sink-record-latency-min?
> > > > > > > > > > > > >> > Maybe
> > > > > > > > > > > > >> > > > we
> > > > > > > > > > > > >> > > > > > could remove this min metric as well and
> > > make
> > > > > all
> > > > > > of
> > > > > > > > > the 3
> > > > > > > > > > > e2e
> > > > > > > > > > > > >> > > metrics
> > > > > > > > > > > > >> > > > > > consistent(since put-batch also doesn't
> > > > expose a
> > > > > > min
> > > > > > > > > which
> > > > > > > > > > > > makes
> > > > > > > > > > > > >> > > sense
> > > > > > > > > > > > >> > > > to
> > > > > > > > > > > > >> > > > > > me). I think this is in contrast to what
> > > Yash
> > > > > > > pointed
> > > > > > > > > out
> > > > > > > > > > > > above
> > > > > > > > > > > > >> so
> > > > > > > > > > > > >> > I
> > > > > > > > > > > > >> > > > > would
> > > > > > > > > > > > >> > > > > > like to hear his thoughts as well.
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > The other point Yash mentioned about the
> > > > > slightly
> > > > > > > > flawed
> > > > > > > > > > > > >> definition
> > > > > > > > > > > > >> > > of
> > > > > > > > > > > > >> > > > > e2e
> > > > > > > > > > > > >> > > > > > is also true in a sense. But I have a
> > > feeling
> > > > > > that's
> > > > > > > > > one the
> > > > > > > > > > > > >> > records
> > > > > > > > > > > > >> > > > are
> > > > > > > > > > > > >> > > > > > polled by the connector tasks, it would
> be
> > > > > > difficult
> > > > > > > > to
> > > > > > > > > > > track
> > > > > > > > > > > > >> the
> > > > > > > > > > > > >> > > final
> > > > > > > > > > > > >> > > > > leg
> > > > > > > > > > > > >> > > > > > via the framework. Probably users can
> > track
> > > > the
> > > > > > > > metrics
> > > > > > > > > at
> > > > > > > > > > > > their
> > > > > > > > > > > > >> > end
> > > > > > > > > > > > >> > > to
> > > > > > > > > > > > >> > > > > > figure that out. Do you think that makes
> > > > sense?
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > Thanks!
> > > > > > > > > > > > >> > > > > > Sagar.
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > On Thu, Sep 1, 2022 at 11:40 PM Jorge
> > > Esteban
> > > > > > > Quilcate
> > > > > > > > > > > Otoya <
> > > > > > > > > > > > >> > > > > > quilcate.jorge@gmail.com> wrote:
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > > Hi Sagar and Yash,
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > Thanks for your feedback!
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > 1) I am assuming the new metrics
> would
> > > be
> > > > > task
> > > > > > > > level
> > > > > > > > > > > > metric.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > 1.1 Yes, it will be a task level
> metric,
> > > > > > > implemented
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > >> > > > > > > Worker[Source/Sink]Task.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > Could you specify the way it's done
> > for
> > > > > other
> > > > > > > > > > > sink/source
> > > > > > > > > > > > >> > > > connector?
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > 1.2. Not sure what do you mean by
> this.
> > > > Could
> > > > > > you
> > > > > > > > > > > elaborate
> > > > > > > > > > > > a
> > > > > > > > > > > > >> bit
> > > > > > > > > > > > >> > > > more?
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > 2. I am slightly confused about the
> > e2e
> > > > > > latency
> > > > > > > > > > > metric...
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > 2.1. Yes, I see. I was trying to
> bring a
> > > > > similar
> > > > > > > > > concept
> > > > > > > > > > > as
> > > > > > > > > > > > in
> > > > > > > > > > > > >> > > > Streams
> > > > > > > > > > > > >> > > > > > with
> > > > > > > > > > > > >> > > > > > > KIP-613, though the e2e concept may
> not
> > be
> > > > > > > > > translatable.
> > > > > > > > > > > > >> > > > > > > We could keep it as
> > `sink-record-latency`
> > > to
> > > > > > avoid
> > > > > > > > > > > > conflating
> > > > > > > > > > > > >> > > > > concepts. A
> > > > > > > > > > > > >> > > > > > > similar metric naming was proposed in
> > > > KIP-489
> > > > > > but
> > > > > > > at
> > > > > > > > > the
> > > > > > > > > > > > >> consumer
> > > > > > > > > > > > >> > > > > level —
> > > > > > > > > > > > >> > > > > > > though it seems dormant for a couple
> of
> > > > years.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > However, the put-batch time measures
> > the
> > > > > > > > > > > > >> > > > > > > > time to put a batch of records to
> > > external
> > > > > > sink.
> > > > > > > > > So, I
> > > > > > > > > > > > would
> > > > > > > > > > > > >> > > assume
> > > > > > > > > > > > >> > > > > > the 2
> > > > > > > > > > > > >> > > > > > > > can't be added as is to compute the
> > e2e
> > > > > > latency.
> > > > > > > > > Maybe I
> > > > > > > > > > > > am
> > > > > > > > > > > > >> > > missing
> > > > > > > > > > > > >> > > > > > > > something here. Could you plz
> clarify
> > > > this.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > 2.2. Yes, agree. Not necessarily
> added,
> > > but
> > > > > with
> > > > > > > > the 3
> > > > > > > > > > > > >> latencies
> > > > > > > > > > > > >> > > > (poll,
> > > > > > > > > > > > >> > > > > > > convert, putBatch) will be clearer
> where
> > > the
> > > > > > > > > bottleneck
> > > > > > > > > > > may
> > > > > > > > > > > > >> be,
> > > > > > > > > > > > >> > and
> > > > > > > > > > > > >> > > > > > > represent the internal processing.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > however, as per the KIP it looks
> like
> > it
> > > > > will
> > > > > > be
> > > > > > > > > > > > >> > > > > > > > the latency between when the record
> > was
> > > > > > written
> > > > > > > to
> > > > > > > > > Kafka
> > > > > > > > > > > > and
> > > > > > > > > > > > >> > when
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > > > > > record is returned by a sink task's
> > > > > consumer's
> > > > > > > > poll?
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > 3.1. Agree. 2.1. could help to clarify
> > > this.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > One more thing - I was wondering
> > > > > > > > > > > > >> > > > > > > > if there's a particular reason for
> > > having
> > > > a
> > > > > > min
> > > > > > > > > metric
> > > > > > > > > > > for
> > > > > > > > > > > > >> e2e
> > > > > > > > > > > > >> > > > > latency
> > > > > > > > > > > > >> > > > > > > but
> > > > > > > > > > > > >> > > > > > > > not for convert time?
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > 3.2. Was following KIP-613 for e2e
> which
> > > > seems
> > > > > > > > useful
> > > > > > > > > to
> > > > > > > > > > > > >> compare
> > > > > > > > > > > > >> > > with
> > > > > > > > > > > > >> > > > > > Max a
> > > > > > > > > > > > >> > > > > > > get an idea of the window of results,
> > > though
> > > > > > > current
> > > > > > > > > > > > >> latencies in
> > > > > > > > > > > > >> > > > > > Connector
> > > > > > > > > > > > >> > > > > > > do not include Min, and that's why I
> > > haven't
> > > > > > added
> > > > > > > > it
> > > > > > > > > for
> > > > > > > > > > > > >> convert
> > > > > > > > > > > > >> > > > > > latency.
> > > > > > > > > > > > >> > > > > > > Do you think it make sense to extend
> > > latency
> > > > > > > metrics
> > > > > > > > > with
> > > > > > > > > > > > Min?
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > KIP is updated to clarify some of
> these
> > > > > changes.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > Many thanks,
> > > > > > > > > > > > >> > > > > > > Jorge.
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > On Thu, 1 Sept 2022 at 18:11, Yash
> > Mayya <
> > > > > > > > > > > > >> yash.mayya@gmail.com>
> > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > > > > Hi Jorge,
> > > > > > > > > > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > > > > > Thanks for the KIP! I have the same
> > > > > confusion
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > >> > > e2e-latency
> > > > > > > > > > > > >> > > > > > > metrics
> > > > > > > > > > > > >> > > > > > > > as Sagar above. "e2e" would seem to
> > > > indicate
> > > > > > the
> > > > > > > > > latency
> > > > > > > > > > > > >> > between
> > > > > > > > > > > > >> > > > when
> > > > > > > > > > > > >> > > > > > the
> > > > > > > > > > > > >> > > > > > > > record was written to Kafka and when
> > the
> > > > > > record
> > > > > > > > was
> > > > > > > > > > > > written
> > > > > > > > > > > > >> to
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > > sink
> > > > > > > > > > > > >> > > > > > > > system by the connector - however,
> as
> > > per
> > > > > the
> > > > > > > KIP
> > > > > > > > it
> > > > > > > > > > > looks
> > > > > > > > > > > > >> like
> > > > > > > > > > > > >> > > it
> > > > > > > > > > > > >> > > > > will
> > > > > > > > > > > > >> > > > > > > be
> > > > > > > > > > > > >> > > > > > > > the latency between when the record
> > was
> > > > > > written
> > > > > > > to
> > > > > > > > > Kafka
> > > > > > > > > > > > and
> > > > > > > > > > > > >> > when
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > > > > > record is returned by a sink task's
> > > > > consumer's
> > > > > > > > > poll? I
> > > > > > > > > > > > think
> > > > > > > > > > > > >> > that
> > > > > > > > > > > > >> > > > > > metric
> > > > > > > > > > > > >> > > > > > > > will be a little confusing to
> > interpret.
> > > > One
> > > > > > > more
> > > > > > > > > thing
> > > > > > > > > > > -
> > > > > > > > > > > > I
> > > > > > > > > > > > >> was
> > > > > > > > > > > > >> > > > > > wondering
> > > > > > > > > > > > >> > > > > > > > if there's a particular reason for
> > > having
> > > > a
> > > > > > min
> > > > > > > > > metric
> > > > > > > > > > > for
> > > > > > > > > > > > >> e2e
> > > > > > > > > > > > >> > > > > latency
> > > > > > > > > > > > >> > > > > > > but
> > > > > > > > > > > > >> > > > > > > > not for convert time?
> > > > > > > > > > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > > > > > Thanks,
> > > > > > > > > > > > >> > > > > > > > Yash
> > > > > > > > > > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > > > > > On Thu, Sep 1, 2022 at 8:59 PM
> Sagar <
> > > > > > > > > > > > >> > sagarmeansocean@gmail.com>
> > > > > > > > > > > > >> > > > > > wrote:
> > > > > > > > > > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > Hi Jorge,
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > Thanks for the KIP. It looks like
> a
> > > very
> > > > > > good
> > > > > > > > > > > addition.
> > > > > > > > > > > > I
> > > > > > > > > > > > >> > > skimmed
> > > > > > > > > > > > >> > > > > > > through
> > > > > > > > > > > > >> > > > > > > > > once and had a couple of questions
> > =>
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > 1) I am assuming the new metrics
> > would
> > > > be
> > > > > > task
> > > > > > > > > level
> > > > > > > > > > > > >> metric.
> > > > > > > > > > > > >> > > > Could
> > > > > > > > > > > > >> > > > > > you
> > > > > > > > > > > > >> > > > > > > > > specify the way it's done for
> other
> > > > > > > sink/source
> > > > > > > > > > > > connector?
> > > > > > > > > > > > >> > > > > > > > > 2) I am slightly confused about
> the
> > > e2e
> > > > > > > latency
> > > > > > > > > > > metric.
> > > > > > > > > > > > >> Let's
> > > > > > > > > > > > >> > > > > > consider
> > > > > > > > > > > > >> > > > > > > > the
> > > > > > > > > > > > >> > > > > > > > > sink connector metric. If I look
> at
> > > the
> > > > > way
> > > > > > > it's
> > > > > > > > > > > > supposed
> > > > > > > > > > > > >> to
> > > > > > > > > > > > >> > be
> > > > > > > > > > > > >> > > > > > > > calculated,
> > > > > > > > > > > > >> > > > > > > > > i.e the difference between the
> > record
> > > > > > > timestamp
> > > > > > > > > and
> > > > > > > > > > > the
> > > > > > > > > > > > >> wall
> > > > > > > > > > > > >> > > > clock
> > > > > > > > > > > > >> > > > > > > time,
> > > > > > > > > > > > >> > > > > > > > it
> > > > > > > > > > > > >> > > > > > > > > looks like a per record metric.
> > > However,
> > > > > the
> > > > > > > > > put-batch
> > > > > > > > > > > > >> time
> > > > > > > > > > > > >> > > > > measures
> > > > > > > > > > > > >> > > > > > > the
> > > > > > > > > > > > >> > > > > > > > > time to put a batch of records to
> > > > external
> > > > > > > sink.
> > > > > > > > > So, I
> > > > > > > > > > > > >> would
> > > > > > > > > > > > >> > > > assume
> > > > > > > > > > > > >> > > > > > > the 2
> > > > > > > > > > > > >> > > > > > > > > can't be added as is to compute
> the
> > > e2e
> > > > > > > latency.
> > > > > > > > > > > Maybe I
> > > > > > > > > > > > >> am
> > > > > > > > > > > > >> > > > missing
> > > > > > > > > > > > >> > > > > > > > > something here. Could you plz
> > clarify
> > > > > this.
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > Thanks!
> > > > > > > > > > > > >> > > > > > > > > Sagar.
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 30, 2022 at 8:43 PM
> > Jorge
> > > > > > Esteban
> > > > > > > > > Quilcate
> > > > > > > > > > > > >> Otoya
> > > > > > > > > > > > >> > <
> > > > > > > > > > > > >> > > > > > > > > quilcate.jorge@gmail.com> wrote:
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > > Hi all,
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > > I'd like to start a discussion
> > > thread
> > > > on
> > > > > > > > > KIP-864:
> > > > > > > > > > > Add
> > > > > > > > > > > > >> > > > End-To-End
> > > > > > > > > > > > >> > > > > > > > Latency
> > > > > > > > > > > > >> > > > > > > > > > Metrics to Connectors.
> > > > > > > > > > > > >> > > > > > > > > > This KIP aims to improve the
> > metrics
> > > > > > > available
> > > > > > > > > on
> > > > > > > > > > > > Source
> > > > > > > > > > > > >> > and
> > > > > > > > > > > > >> > > > Sink
> > > > > > > > > > > > >> > > > > > > > > > Connectors to measure end-to-end
> > > > > latency,
> > > > > > > > > including
> > > > > > > > > > > > >> source
> > > > > > > > > > > > >> > > and
> > > > > > > > > > > > >> > > > > sink
> > > > > > > > > > > > >> > > > > > > > > record
> > > > > > > > > > > > >> > > > > > > > > > conversion time, and sink record
> > e2e
> > > > > > latency
> > > > > > > > > > > (similar
> > > > > > > > > > > > to
> > > > > > > > > > > > >> > > > KIP-613
> > > > > > > > > > > > >> > > > > > for
> > > > > > > > > > > > >> > > > > > > > > > Streams).
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > > The KIP is here:
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > > Please take a look and let me
> know
> > > > what
> > > > > > you
> > > > > > > > > think.
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > > > Cheers,
> > > > > > > > > > > > >> > > > > > > > > > Jorge.
> > > > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > > > >> > > > > > > >
> > > > > > > > > > > > >> > > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>