You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Kai Huang <ka...@gmail.com> on 2021/07/28 00:53:14 UTC

Re: Propose a KIP to report "REAL" broker/consumer fetch latency?

Hi Ming, I will be interested in the proposed capability to diagnose Kafka latency issues and continue the discussion. Do you mind if I take over this discussion thread and follow up with the community?

On 2021/04/25 17:33:10, Ming Liu <mi...@gmail.com> wrote: 
> The idea I am trying right now is:
> 1. Add waitTimeMS in FetchResponse.
> 2. If the fetch has to wait in purgatory due to either
> replica.fetch.wait.max.ms or fetch.min.bytes, then it will fill the
> waitTimeMS in FetchResponse.
> 3. In updateRequestMetrics() function, we will special-process the Fetch
> response, and remove the waitTimeMS out of RemoteTime and TotalTime.
> Let me know for any suggestion/feedback.  I like to propose a KIP on that
> change.
> 
> 
> On Sat, Apr 24, 2021 at 6:09 PM Israel Ekpo <is...@gmail.com> wrote:
> 
> > Hi Ming
> >
> > This would be a useful metric from a monitoring perspective especially
> > when troubleshooting or diagnosing issues.
> >
> > Are you looking to modify the Admin API for this capability to be added?
> > The metrics for quorum controllers, brokers, replicas and consumers may
> > need to be reported differently
> >
> > I am interested in this capability as well.
> >
> > Maybe there is something in the current Admin API that is not obvious yet
> > so I will need to investigate first and will get back to you with my
> > thoughts/suggestions.
> >
> > Thanks for bringing this up
> >
> > Cheers
> >
> >
> >
> > On Sat, Apr 24, 2021 at 1:21 PM Ming Liu <mi...@gmail.com> wrote:
> >
> >> Hi All,
> >>      I am thinking about to start a KIP to report "REAL" broker/consumer
> >> fetch latency. Before that, I like to collect any idea or suggestions.  I
> >> created https://issues.apache.org/jira/browse/KAFKA-12713.
> >>      The fetch latency is an important metric to monitor for the cluster
> >> performance. With ACK=ALL, the produce latency is affected primarily by
> >> broker fetch latency.  However, currently the reported fetch latency
> >> didn't
> >> reflect the true fetch latency because it sometimes needs to stay in
> >> purgatory and wait for replica.fetch.wait.max.ms when data is not
> >> available. This greatly affects the real P50, P99 etc.
> >>
> >> I like to propose a KIP to be able track the real fetch latency for both
> >> broker follower and consumer.
> >>
> >> Ming
> >>
> >
>