You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Luke Chen <sh...@gmail.com> on 2022/05/01 09:08:47 UTC

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Hi James,

Sorry for the late reply.

Yes, this is a good point, to know how many segments to be recovered if
there are some large partitions.
I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric for
each log recovery thread, to show the value.
The example in the Proposed section here
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges>
shows what it will look like.

Thanks for the suggestion.

Thank you.
Luke

On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com> wrote:

> The KIP describes RemainingLogsToRecovery, which seems to be the number of
> partitions in each log.dir.
>
> We have some partitions which are much much larger than others. Those
> large partitions have many many more segments than others.
>
> Is there a way the metric can reflect partition size? Could it be
> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>
> -James
>
> Sent from my iPhone
>
> > On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I'd like to propose a KIP to expose a metric for log recovery progress.
> > This metric would let the admins have a way to monitor the log recovery
> > progress.
> > Details can be found here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> >
> > Any feedback is appreciated.
> >
> > Thank you.
> > Luke
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Federico Valeri <fe...@gmail.com>.

Hi Luke, thanks for the KIP.

I think we miss the "dir" key in "remainingLogsToRecover" ObjectName.

kafka.log:type=LogManager,name=remainingLogsToRecover,dir=([-._\/\w\d\s]+)
kafka.log:type=LogManager,name=remainingSegmentsToRecover,dir=([-._\/\w\d\s]+),threadNum=([0-9]+)

Example:

Broker configuration:
log.dirs=/tmp/log1,/tmp/log2
num.recovery.threads.per.data.dir=2

Registered ObjectNames:
kafka.log:type=LogManager,name=remainingLogsToRecover,dir=/tmp/log1
kafka.log:type=LogManager,name=remainingLogsToRecover,dir=/tmp/log2
kafka.log:type=LogManager,name=remainingSegmentsToRecover,dir=/tmp/log1,threadNum=0
kafka.log:type=LogManager,name=remainingSegmentsToRecover,dir=/tmp/log1,threadNum=1
kafka.log:type=LogManager,name=remainingSegmentsToRecover,dir=/tmp/log2,threadNum=0
kafka.log:type=LogManager,name=remainingSegmentsToRecover,dir=/tmp/log2,threadNum=1

On Sat, Jun 4, 2022 at 4:42 AM Luke Chen <sh...@gmail.com> wrote:
>
> Hi Jun,
>
> Thanks for the comment.
>
> I've updated the KIP as:
> 1. remainingLogsToRecover*y* -> remainingLogsToRecover
> 2. remainingSegmentsToRecover*y* -> remainingSegmentsToRecover
> 3. The description of remainingSegmentsToRecover: The remaining segments
> for the current log assigned to the recovery thread.
>
> Thank you.
> Luke
>
> On Sat, Jun 4, 2022 at 12:54 AM Jun Rao <ju...@confluent.io.invalid> wrote:
>
> > Hi, Luke,
> >
> > Thanks for the explanation.
> >
> > 10. It makes sense to me now. Instead of using a longer name, perhaps we
> > could keep the current name, but make the description clear that it's the
> > remaining segments for the current log assigned to a thread. Also, would it
> > be better to use ToRecover instead of ToRecovery?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jun 3, 2022 at 1:18 AM Luke Chen <sh...@gmail.com> wrote:
> >
> > > Hi Jun,
> > >
> > > > how do we implement kafka.log
> > >
> > >
> > :type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > > which tracks at the segment level?
> > >
> > > It looks like the name is misleading.
> > > Suppose we have 2 log recovery threads
> > > (num.recovery.threads.per.data.dir=2),
> > > and 10 logs to iterate through
> > > As mentioned before, we don't know how many segments in each log until
> > the
> > > log is iterated(loaded)
> > > So, when thread-1 iterates logA, it gets the segments to recover, and
> > > expose the number to `remainingSegmentsToRecovery` metric.
> > > And the same for thread-2 iterating logB.
> > > That is, the metric showed in `remainingSegmentsToRecovery` is actually
> > the
> > > remaining segments to recover "in a specific log".
> > >
> > > Maybe I should rename it: remainingSegmentsToRecovery ->
> > > remainingSegmentsToRecoverInCurrentLog
> > > WDYT?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, Jun 3, 2022 at 1:27 AM Jun Rao <ju...@confluent.io.invalid> wrote:
> > >
> > > > Hi, Luke,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 10. You are saying it's difficult to track the number of segments to
> > > > recover. But how do we
> > > > implement
> > > >
> > >
> > kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > > > which tracks at the segment level?
> > > >
> > > > Jun
> > > >
> > > > On Thu, Jun 2, 2022 at 3:39 AM Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for the comment.
> > > > >
> > > > > Yes, I've tried to work on this way to track the number of remaining
> > > > > segments, but it will change the design in UnifiedLog, so I only
> > track
> > > > the
> > > > > logs number.
> > > > > Currently, we will load all segments and recover those segments if
> > > needed
> > > > > "during creating UnifiedLog instance". And also get the log offsets
> > > here
> > > > > <
> > > > >
> > > >
> > >
> > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > > > > >
> > > > > .
> > > > > That is, if we want to get all segments to be recovered before
> > running
> > > > log
> > > > > recovery, we need to break the logic in UnifiedLog, to create a
> > partial
> > > > > UnifiedLog instance, and add more info to it later, which I think is
> > > just
> > > > > making the codes more complicated.
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 26, 2022 at 2:57 AM Jun Rao <ju...@confluent.io.invalid>
> > > > wrote:
> > > > >
> > > > > > Hi, Luke,
> > > > > >
> > > > > > Thanks for the KIP. Just one comment.
> > > > > >
> > > > > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery,
> > could
> > > > we
> > > > > > instead track the number of remaining segments? This monitors the
> > > > > progress
> > > > > > at a finer granularity and is also consistent with the thread level
> > > > > metric.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks Luke! LGTM.
> > > > > > >
> > > > > > > On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi Tom and Raman,
> > > > > > > >
> > > > > > > > Thanks for your comments.
> > > > > > > >
> > > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > > > updating).
> > > > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > > > >
> > > > > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > > > > >
> > > > > > > > > 3. I wonder whether we need to keep these metrics (with value
> > > 0)
> > > > > once
> > > > > > > the
> > > > > > > > broker enters the running state. Do you see it as valuable? A
> > > > benefit
> > > > > > of
> > > > > > > > removing the metrics would be a reduction on storage required
> > for
> > > > > > metric
> > > > > > > > stores which are recording these metrics.
> > > > > > > >
> > > > > > > > Yes, removing the metrics after log recovery completed is a
> > good
> > > > > idea.
> > > > > > > > Updated the KIP.
> > > > > > > >
> > > > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > > > clearer.
> > > > > > > > Previous KIPs which added metrics usually used a table, with
> > the
> > > > > MBean
> > > > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > > > KIP-748,
> > > > > > > > KIP-608). Similarly you could use a table in the proposed
> > changes
> > > > > > section
> > > > > > > > rather than describing the tree you'd see in an MBean console.
> > > > > > > >
> > > > > > > > Good point! Updated the KIP to use a table to list the MBean
> > > name,
> > > > > > metric
> > > > > > > > type and descriptions.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > > > > > <rverma@confluent.io.invalid
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Luke,
> > > > > > > > >
> > > > > > > > > The change is useful and simple. Thanks.
> > > > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Raman Verma
> > > > > > > > >
> > > > > > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <
> > > tbentley@redhat.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Luke,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. I think the idea makes sense and would
> > > > > provide
> > > > > > > > useful
> > > > > > > > > > observability of log recovery. I have a few comments.
> > > > > > > > > >
> > > > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > > > > updating).
> > > > > > > > > > 2. Similarly the link to this discussion thread needs
> > > updating.
> > > > > > > > > > 3. I wonder whether we need to keep these metrics (with
> > value
> > > > 0)
> > > > > > once
> > > > > > > > the
> > > > > > > > > > broker enters the running state. Do you see it as
> > valuable? A
> > > > > > benefit
> > > > > > > > of
> > > > > > > > > > removing the metrics would be a reduction on storage
> > required
> > > > for
> > > > > > > > metric
> > > > > > > > > > stores which are recording these metrics.
> > > > > > > > > > 4. I think the KIP's public interfaces section could be a
> > bit
> > > > > > > clearer.
> > > > > > > > > > Previous KIPs which added metrics usually used a table,
> > with
> > > > the
> > > > > > > MBean
> > > > > > > > > > name, metric type and description. SeeKIP-551 for example
> > (or
> > > > > > > KIP-748,
> > > > > > > > > > KIP-608). Similarly you could use a table in the proposed
> > > > changes
> > > > > > > > section
> > > > > > > > > > rather than describing the tree you'd see in an MBean
> > > console.
> > > > > > > > > >
> > > > > > > > > > Kind regards,
> > > > > > > > > >
> > > > > > > > > > Tom
> > > > > > > > > >
> > > > > > > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <showuon@gmail.com
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > > And if people start using RemainingLogs and
> > > > RemainingSegments
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > REALLY FEEL like they need RemainingBytes, then we can
> > > always
> > > > > add
> > > > > > > it
> > > > > > > > > in the
> > > > > > > > > > > future.
> > > > > > > > > > >
> > > > > > > > > > > +1
> > > > > > > > > > >
> > > > > > > > > > > Thanks James!
> > > > > > > > > > > Luke
> > > > > > > > > > >
> > > > > > > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> > > > > > wushujames@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Luke,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the detailed explanation. I agree that the
> > > > current
> > > > > > > > > proposal of
> > > > > > > > > > > > RemainingLogs and RemainingSegments will greatly
> > improve
> > > > the
> > > > > > > > > situation,
> > > > > > > > > > > and
> > > > > > > > > > > > that we can go ahead with the KIP as is.
> > > > > > > > > > > >
> > > > > > > > > > > > If RemainingBytes were straight-forward to implement,
> > > then
> > > > > I’d
> > > > > > > like
> > > > > > > > > to
> > > > > > > > > > > > have it. But we can live without it for now. And if
> > > people
> > > > > > start
> > > > > > > > > using
> > > > > > > > > > > > RemainingLogs and RemainingSegments and then REALLY
> > FEEL
> > > > like
> > > > > > > they
> > > > > > > > > need
> > > > > > > > > > > > RemainingBytes, then we can always add it in the
> > future.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks Luke, for the detailed explanation, and for
> > > > responding
> > > > > > to
> > > > > > > my
> > > > > > > > > > > > feedback!
> > > > > > > > > > > >
> > > > > > > > > > > > -James
> > > > > > > > > > > >
> > > > > > > > > > > > Sent from my iPhone
> > > > > > > > > > > >
> > > > > > > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <
> > > > showuon@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi James and all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I checked again and I can see when creating
> > UnifiedLog,
> > > > we
> > > > > > > > > expected the
> > > > > > > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > > > > > > So, I don't think we should break the current design
> > to
> > > > > > expose
> > > > > > > > the
> > > > > > > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > > > > > > metric.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If there is no other comments, I'll start a vote
> > within
> > > > > this
> > > > > > > > week.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you.
> > > > > > > > > > > > > Luke
> > > > > > > > > > > > >
> > > > > > > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <
> > > > > showuon@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Hi James,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks for your input.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> For the `RemainingBytesToRecovery` metric proposal,
> > I
> > > > > think
> > > > > > > > > there's
> > > > > > > > > > > one
> > > > > > > > > > > > >> thing I didn't make it clear.
> > > > > > > > > > > > >> Currently, when log manager start up, we'll try to
> > > load
> > > > > all
> > > > > > > logs
> > > > > > > > > > > > >> (segments), and during the log loading, we'll try to
> > > > > recover
> > > > > > > > logs
> > > > > > > > > if
> > > > > > > > > > > > >> necessary.
> > > > > > > > > > > > >> And the logs loading is using "thread pool" as you
> > > > > thought.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> So, here's the problem:
> > > > > > > > > > > > >> All segments in each log folder (partition) will be
> > > > loaded
> > > > > > in
> > > > > > > > > each log
> > > > > > > > > > > > >> recovery thread, and until it's loaded, we can know
> > > how
> > > > > many
> > > > > > > > > segments
> > > > > > > > > > > > (or
> > > > > > > > > > > > >> how many Bytes) needed to recover.
> > > > > > > > > > > > >> That means, if we have 10 partition logs in one
> > > broker,
> > > > > and
> > > > > > we
> > > > > > > > > have 2
> > > > > > > > > > > > log
> > > > > > > > > > > > >> recovery threads
> > > (num.recovery.threads.per.data.dir=2),
> > > > > > before
> > > > > > > > the
> > > > > > > > > > > > >> threads load the segments in each log, we only know
> > > how
> > > > > many
> > > > > > > > logs
> > > > > > > > > > > > >> (partitions) we have in the broker (i.e.
> > > > > > > RemainingLogsToRecover
> > > > > > > > > > > metric).
> > > > > > > > > > > > >> We cannot know how many segments/Bytes needed to
> > > recover
> > > > > > until
> > > > > > > > > each
> > > > > > > > > > > > thread
> > > > > > > > > > > > >> starts to load the segments under one log
> > (partition).
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > > > > > > >> Currently, there are still 5 logs (partitions)
> > needed
> > > to
> > > > > > > recover
> > > > > > > > > under
> > > > > > > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the
> > jobs,
> > > > > where
> > > > > > > one
> > > > > > > > > > > thread
> > > > > > > > > > > > has
> > > > > > > > > > > > >> 10000 segments needed to recover, and the other one
> > > has
> > > > 3
> > > > > > > > segments
> > > > > > > > > > > > needed
> > > > > > > > > > > > >> to recover.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>   - kafka.log
> > > > > > > > > > > > >>      - LogManager
> > > > > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > > > > >>            - /tmp/log1 => 5            ← there are 5
> > > > logs
> > > > > > > under
> > > > > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > > > > >>            - /tmp/log1                     ← 2
> > threads
> > > > are
> > > > > > > doing
> > > > > > > > > log
> > > > > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > > > > >>            - 0 => 10000         ← there are 10000
> > > > segments
> > > > > > > > needed
> > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > >>               recovered for thread 0
> > > > > > > > > > > > >>               - 1 => 3
> > > > > > > > > > > > >>               - /tmp/log2
> > > > > > > > > > > > >>               - 0 => 0
> > > > > > > > > > > > >>               - 1 => 0
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > > > > > > >> It said, now, there are only 4 logs needed to
> > recover
> > > in
> > > > > > > > > /tmp/log1,
> > > > > > > > > > > and
> > > > > > > > > > > > >> the thread 0 has 9000 segments left, and thread 1
> > has
> > > 5
> > > > > > > segments
> > > > > > > > > left
> > > > > > > > > > > > >> (which should imply the thread already completed 2
> > > logs
> > > > > > > recovery
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > >> period)
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>   - kafka.log
> > > > > > > > > > > > >>      - LogManager
> > > > > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > > > > >>            - /tmp/log1 => 3            ← there are 3
> > > > logs
> > > > > > > under
> > > > > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > > > > >>            - /tmp/log1                     ← 2
> > threads
> > > > are
> > > > > > > doing
> > > > > > > > > log
> > > > > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > > > > >>            - 0 => 9000         ← there are 9000
> > > segments
> > > > > > > needed
> > > > > > > > > to be
> > > > > > > > > > > > >>               recovered for thread 0
> > > > > > > > > > > > >>               - 1 => 5
> > > > > > > > > > > > >>               - /tmp/log2
> > > > > > > > > > > > >>               - 0 => 0
> > > > > > > > > > > > >>               - 1 => 0
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> > > > > > difficult
> > > > > > > to
> > > > > > > > > > > achieve
> > > > > > > > > > > > >> as you expected. I think the current proposal with
> > > > > > > > > > > > `RemainingLogsToRecover`
> > > > > > > > > > > > >> and `RemainingSegmentsToRecover` should already
> > > provide
> > > > > > enough
> > > > > > > > > info
> > > > > > > > > > > for
> > > > > > > > > > > > >> the log recovery progress.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thank you.
> > > > > > > > > > > > >> Luke
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > > > > > > wushujames@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Hi Luke,
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Another thought: different topics can have
> > different
> > > > > > segment
> > > > > > > > > sizes. I
> > > > > > > > > > > > >>> don't know how common it is, but it is possible.
> > Some
> > > > > > topics
> > > > > > > > > might
> > > > > > > > > > > want
> > > > > > > > > > > > >>> small segment sizes to more granular expiration of
> > > > data.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > > > > > > RemainingSegmentsToRecovery
> > > > > > > > > > > > >>> is that the rate that they will decrement depends
> > on
> > > > the
> > > > > > > > > > > configuration
> > > > > > > > > > > > and
> > > > > > > > > > > > >>> patterns of the topics and partitions and segment
> > > > sizes.
> > > > > If
> > > > > > > > > someone
> > > > > > > > > > > is
> > > > > > > > > > > > >>> monitoring those metrics, they might see times
> > where
> > > > the
> > > > > > > metric
> > > > > > > > > > > > decrements
> > > > > > > > > > > > >>> slowly, followed by a burst where it decrements
> > > > quickly.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> What about RemainingBytesToRecovery? This would not
> > > > > depend
> > > > > > on
> > > > > > > > the
> > > > > > > > > > > > >>> configuration of the topic or of the data. It would
> > > > > > actually
> > > > > > > > be a
> > > > > > > > > > > > pretty
> > > > > > > > > > > > >>> good metric, because I think that this metric would
> > > > > change
> > > > > > > at a
> > > > > > > > > > > > constant
> > > > > > > > > > > > >>> rate (based on the disk I/O speed that the broker
> > > > > allocates
> > > > > > > to
> > > > > > > > > > > > recovery).
> > > > > > > > > > > > >>> Because it changes at a constant rate, you would be
> > > > able
> > > > > to
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > >>> rate-of-change to predict when it hits zero, which
> > > will
> > > > > let
> > > > > > > you
> > > > > > > > > know
> > > > > > > > > > > > when
> > > > > > > > > > > > >>> the broker is going to start up. Like, I would
> > > imagine
> > > > if
> > > > > > we
> > > > > > > > > graphed
> > > > > > > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly
> > > > straight
> > > > > > line
> > > > > > > > > that is
> > > > > > > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> What do you think about adding
> > > > RemainingBytesToRecovery?
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Or, what would you think about making the primary
> > > > metric
> > > > > be
> > > > > > > > > > > > >>> RemainingBytesToRecovery, and getting rid of the
> > > > others?
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> I don't know if I personally would rather have all
> > 3
> > > > > > metrics,
> > > > > > > > or
> > > > > > > > > > > would
> > > > > > > > > > > > >>> just use RemainingBytesToRecovery. I'd too would
> > like
> > > > > more
> > > > > > > > > community
> > > > > > > > > > > > input
> > > > > > > > > > > > >>> on which of those metrics would be useful to
> > people.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > > > > > > >>> num.recovery.threads.per.data.dir=2, that there
> > might
> > > > be
> > > > > a
> > > > > > > > > separate
> > > > > > > > > > > > >>> RemainingSegmentsToRecovery counter for each
> > thread.
> > > Is
> > > > > > that
> > > > > > > > > actually
> > > > > > > > > > > > how
> > > > > > > > > > > > >>> the data is structured within the Kafka recovery
> > > > threads?
> > > > > > > Does
> > > > > > > > > each
> > > > > > > > > > > > thread
> > > > > > > > > > > > >>> get a fixed set of partitions, or is there just one
> > > big
> > > > > > pool
> > > > > > > of
> > > > > > > > > > > > partitions
> > > > > > > > > > > > >>> that the threads all work on?
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> As a more concrete example:
> > > > > > > > > > > > >>> * If I have 9 small partitions and 1 big partition,
> > > and
> > > > > > > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > > > > > > >>> Does each thread get 5 partitions, which means one
> > > > thread
> > > > > > > will
> > > > > > > > > finish
> > > > > > > > > > > > >>> much sooner than the other?
> > > > > > > > > > > > >>> OR
> > > > > > > > > > > > >>> Do both threads just work on the set of 10
> > > partitions,
> > > > > > which
> > > > > > > > > means
> > > > > > > > > > > > likely
> > > > > > > > > > > > >>> 1 thread will be busy with the big partition, while
> > > the
> > > > > > other
> > > > > > > > one
> > > > > > > > > > > ends
> > > > > > > > > > > > up
> > > > > > > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> If each thread gets assigned 5 partitions, then it
> > > > would
> > > > > > make
> > > > > > > > > sense
> > > > > > > > > > > > that
> > > > > > > > > > > > >>> each thread has its own counter.
> > > > > > > > > > > > >>> If the threads works on a single pool of 10
> > > partitions,
> > > > > > then
> > > > > > > it
> > > > > > > > > would
> > > > > > > > > > > > >>> probably mean that the counter is on the pool of
> > > > > partitions
> > > > > > > > > itself,
> > > > > > > > > > > > and not
> > > > > > > > > > > > >>> on each thread.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> -James
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <
> > > > > showuon@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> Hi devs,
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> If there are no other comments, I'll start a vote
> > > > > > tomorrow.
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> Thank you.
> > > > > > > > > > > > >>>> Luke
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> > > > > > showuon@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>>> Hi James,
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>> Sorry for the late reply.
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>> Yes, this is a good point, to know how many
> > > segments
> > > > to
> > > > > > be
> > > > > > > > > > > recovered
> > > > > > > > > > > > if
> > > > > > > > > > > > >>>>> there are some large partitions.
> > > > > > > > > > > > >>>>> I've updated the KIP, to add a
> > > > > > > `*RemainingSegmentsToRecover*`
> > > > > > > > > > > metric
> > > > > > > > > > > > >>> for
> > > > > > > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > > > > > > >>>>> The example in the Proposed section here
> > > > > > > > > > > > >>>>> <
> > > > > > > > > > > > >>>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>>> shows what it will look like.
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>> Thank you.
> > > > > > > > > > > > >>>>> Luke
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > > > > > > wushujames@gmail.com>
> > > > > > > > > > > > >>> wrote:
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which
> > > > seems
> > > > > > to
> > > > > > > be
> > > > > > > > > the
> > > > > > > > > > > > >>> number
> > > > > > > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>>>> We have some partitions which are much much
> > larger
> > > > > than
> > > > > > > > > others.
> > > > > > > > > > > > Those
> > > > > > > > > > > > >>>>>> large partitions have many many more segments
> > than
> > > > > > others.
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>>>> Is there a way the metric can reflect partition
> > > > size?
> > > > > > > Could
> > > > > > > > > it be
> > > > > > > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > > > > > > RemainingBytesToRecover?
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>>>> -James
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>>>> Sent from my iPhone
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > > > > > > showuon@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > >>>>>>> Hi all,
> > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric
> > for
> > > > log
> > > > > > > > recovery
> > > > > > > > > > > > >>> progress.
> > > > > > > > > > > > >>>>>>> This metric would let the admins have a way to
> > > > > monitor
> > > > > > > the
> > > > > > > > > log
> > > > > > > > > > > > >>> recovery
> > > > > > > > > > > > >>>>>>> progress.
> > > > > > > > > > > > >>>>>>> Details can be found here:
> > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > >>>>>>> Thank you.
> > > > > > > > > > > > >>>>>>> Luke
> > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > >>>>>
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards,
> > > > > > > > > Raman Verma
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi Jun,

Thanks for the comment.

I've updated the KIP as:
1. remainingLogsToRecover*y* -> remainingLogsToRecover
2. remainingSegmentsToRecover*y* -> remainingSegmentsToRecover
3. The description of remainingSegmentsToRecover: The remaining segments
for the current log assigned to the recovery thread.

Thank you.
Luke

On Sat, Jun 4, 2022 at 12:54 AM Jun Rao <ju...@confluent.io.invalid> wrote:

> Hi, Luke,
>
> Thanks for the explanation.
>
> 10. It makes sense to me now. Instead of using a longer name, perhaps we
> could keep the current name, but make the description clear that it's the
> remaining segments for the current log assigned to a thread. Also, would it
> be better to use ToRecover instead of ToRecovery?
>
> Thanks,
>
> Jun
>
> On Fri, Jun 3, 2022 at 1:18 AM Luke Chen <sh...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > > how do we implement kafka.log
> >
> >
> :type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > which tracks at the segment level?
> >
> > It looks like the name is misleading.
> > Suppose we have 2 log recovery threads
> > (num.recovery.threads.per.data.dir=2),
> > and 10 logs to iterate through
> > As mentioned before, we don't know how many segments in each log until
> the
> > log is iterated(loaded)
> > So, when thread-1 iterates logA, it gets the segments to recover, and
> > expose the number to `remainingSegmentsToRecovery` metric.
> > And the same for thread-2 iterating logB.
> > That is, the metric showed in `remainingSegmentsToRecovery` is actually
> the
> > remaining segments to recover "in a specific log".
> >
> > Maybe I should rename it: remainingSegmentsToRecovery ->
> > remainingSegmentsToRecoverInCurrentLog
> > WDYT?
> >
> > Thank you.
> > Luke
> >
> > On Fri, Jun 3, 2022 at 1:27 AM Jun Rao <ju...@confluent.io.invalid> wrote:
> >
> > > Hi, Luke,
> > >
> > > Thanks for the reply.
> > >
> > > 10. You are saying it's difficult to track the number of segments to
> > > recover. But how do we
> > > implement
> > >
> >
> kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > > which tracks at the segment level?
> > >
> > > Jun
> > >
> > > On Thu, Jun 2, 2022 at 3:39 AM Luke Chen <sh...@gmail.com> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the comment.
> > > >
> > > > Yes, I've tried to work on this way to track the number of remaining
> > > > segments, but it will change the design in UnifiedLog, so I only
> track
> > > the
> > > > logs number.
> > > > Currently, we will load all segments and recover those segments if
> > needed
> > > > "during creating UnifiedLog instance". And also get the log offsets
> > here
> > > > <
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > > > >
> > > > .
> > > > That is, if we want to get all segments to be recovered before
> running
> > > log
> > > > recovery, we need to break the logic in UnifiedLog, to create a
> partial
> > > > UnifiedLog instance, and add more info to it later, which I think is
> > just
> > > > making the codes more complicated.
> > > >
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > >
> > > >
> > > > On Thu, May 26, 2022 at 2:57 AM Jun Rao <ju...@confluent.io.invalid>
> > > wrote:
> > > >
> > > > > Hi, Luke,
> > > > >
> > > > > Thanks for the KIP. Just one comment.
> > > > >
> > > > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery,
> could
> > > we
> > > > > instead track the number of remaining segments? This monitors the
> > > > progress
> > > > > at a finer granularity and is also consistent with the thread level
> > > > metric.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com>
> > > wrote:
> > > > >
> > > > > > Thanks Luke! LGTM.
> > > > > >
> > > > > > On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi Tom and Raman,
> > > > > > >
> > > > > > > Thanks for your comments.
> > > > > > >
> > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > > updating).
> > > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > > >
> > > > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > > > >
> > > > > > > > 3. I wonder whether we need to keep these metrics (with value
> > 0)
> > > > once
> > > > > > the
> > > > > > > broker enters the running state. Do you see it as valuable? A
> > > benefit
> > > > > of
> > > > > > > removing the metrics would be a reduction on storage required
> for
> > > > > metric
> > > > > > > stores which are recording these metrics.
> > > > > > >
> > > > > > > Yes, removing the metrics after log recovery completed is a
> good
> > > > idea.
> > > > > > > Updated the KIP.
> > > > > > >
> > > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > > clearer.
> > > > > > > Previous KIPs which added metrics usually used a table, with
> the
> > > > MBean
> > > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > > KIP-748,
> > > > > > > KIP-608). Similarly you could use a table in the proposed
> changes
> > > > > section
> > > > > > > rather than describing the tree you'd see in an MBean console.
> > > > > > >
> > > > > > > Good point! Updated the KIP to use a table to list the MBean
> > name,
> > > > > metric
> > > > > > > type and descriptions.
> > > > > > >
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > > > > <rverma@confluent.io.invalid
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Luke,
> > > > > > > >
> > > > > > > > The change is useful and simple. Thanks.
> > > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Raman Verma
> > > > > > > >
> > > > > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <
> > tbentley@redhat.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Luke,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. I think the idea makes sense and would
> > > > provide
> > > > > > > useful
> > > > > > > > > observability of log recovery. I have a few comments.
> > > > > > > > >
> > > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > > > updating).
> > > > > > > > > 2. Similarly the link to this discussion thread needs
> > updating.
> > > > > > > > > 3. I wonder whether we need to keep these metrics (with
> value
> > > 0)
> > > > > once
> > > > > > > the
> > > > > > > > > broker enters the running state. Do you see it as
> valuable? A
> > > > > benefit
> > > > > > > of
> > > > > > > > > removing the metrics would be a reduction on storage
> required
> > > for
> > > > > > > metric
> > > > > > > > > stores which are recording these metrics.
> > > > > > > > > 4. I think the KIP's public interfaces section could be a
> bit
> > > > > > clearer.
> > > > > > > > > Previous KIPs which added metrics usually used a table,
> with
> > > the
> > > > > > MBean
> > > > > > > > > name, metric type and description. SeeKIP-551 for example
> (or
> > > > > > KIP-748,
> > > > > > > > > KIP-608). Similarly you could use a table in the proposed
> > > changes
> > > > > > > section
> > > > > > > > > rather than describing the tree you'd see in an MBean
> > console.
> > > > > > > > >
> > > > > > > > > Kind regards,
> > > > > > > > >
> > > > > > > > > Tom
> > > > > > > > >
> > > > > > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <showuon@gmail.com
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > > And if people start using RemainingLogs and
> > > RemainingSegments
> > > > > and
> > > > > > > > then
> > > > > > > > > > REALLY FEEL like they need RemainingBytes, then we can
> > always
> > > > add
> > > > > > it
> > > > > > > > in the
> > > > > > > > > > future.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > > Thanks James!
> > > > > > > > > > Luke
> > > > > > > > > >
> > > > > > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> > > > > wushujames@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Luke,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the detailed explanation. I agree that the
> > > current
> > > > > > > > proposal of
> > > > > > > > > > > RemainingLogs and RemainingSegments will greatly
> improve
> > > the
> > > > > > > > situation,
> > > > > > > > > > and
> > > > > > > > > > > that we can go ahead with the KIP as is.
> > > > > > > > > > >
> > > > > > > > > > > If RemainingBytes were straight-forward to implement,
> > then
> > > > I’d
> > > > > > like
> > > > > > > > to
> > > > > > > > > > > have it. But we can live without it for now. And if
> > people
> > > > > start
> > > > > > > > using
> > > > > > > > > > > RemainingLogs and RemainingSegments and then REALLY
> FEEL
> > > like
> > > > > > they
> > > > > > > > need
> > > > > > > > > > > RemainingBytes, then we can always add it in the
> future.
> > > > > > > > > > >
> > > > > > > > > > > Thanks Luke, for the detailed explanation, and for
> > > responding
> > > > > to
> > > > > > my
> > > > > > > > > > > feedback!
> > > > > > > > > > >
> > > > > > > > > > > -James
> > > > > > > > > > >
> > > > > > > > > > > Sent from my iPhone
> > > > > > > > > > >
> > > > > > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <
> > > showuon@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi James and all,
> > > > > > > > > > > >
> > > > > > > > > > > > I checked again and I can see when creating
> UnifiedLog,
> > > we
> > > > > > > > expected the
> > > > > > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > > > > > So, I don't think we should break the current design
> to
> > > > > expose
> > > > > > > the
> > > > > > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > > > > > metric.
> > > > > > > > > > > >
> > > > > > > > > > > > If there is no other comments, I'll start a vote
> within
> > > > this
> > > > > > > week.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you.
> > > > > > > > > > > > Luke
> > > > > > > > > > > >
> > > > > > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <
> > > > showuon@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> Hi James,
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thanks for your input.
> > > > > > > > > > > >>
> > > > > > > > > > > >> For the `RemainingBytesToRecovery` metric proposal,
> I
> > > > think
> > > > > > > > there's
> > > > > > > > > > one
> > > > > > > > > > > >> thing I didn't make it clear.
> > > > > > > > > > > >> Currently, when log manager start up, we'll try to
> > load
> > > > all
> > > > > > logs
> > > > > > > > > > > >> (segments), and during the log loading, we'll try to
> > > > recover
> > > > > > > logs
> > > > > > > > if
> > > > > > > > > > > >> necessary.
> > > > > > > > > > > >> And the logs loading is using "thread pool" as you
> > > > thought.
> > > > > > > > > > > >>
> > > > > > > > > > > >> So, here's the problem:
> > > > > > > > > > > >> All segments in each log folder (partition) will be
> > > loaded
> > > > > in
> > > > > > > > each log
> > > > > > > > > > > >> recovery thread, and until it's loaded, we can know
> > how
> > > > many
> > > > > > > > segments
> > > > > > > > > > > (or
> > > > > > > > > > > >> how many Bytes) needed to recover.
> > > > > > > > > > > >> That means, if we have 10 partition logs in one
> > broker,
> > > > and
> > > > > we
> > > > > > > > have 2
> > > > > > > > > > > log
> > > > > > > > > > > >> recovery threads
> > (num.recovery.threads.per.data.dir=2),
> > > > > before
> > > > > > > the
> > > > > > > > > > > >> threads load the segments in each log, we only know
> > how
> > > > many
> > > > > > > logs
> > > > > > > > > > > >> (partitions) we have in the broker (i.e.
> > > > > > RemainingLogsToRecover
> > > > > > > > > > metric).
> > > > > > > > > > > >> We cannot know how many segments/Bytes needed to
> > recover
> > > > > until
> > > > > > > > each
> > > > > > > > > > > thread
> > > > > > > > > > > >> starts to load the segments under one log
> (partition).
> > > > > > > > > > > >>
> > > > > > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > > > > > >> Currently, there are still 5 logs (partitions)
> needed
> > to
> > > > > > recover
> > > > > > > > under
> > > > > > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the
> jobs,
> > > > where
> > > > > > one
> > > > > > > > > > thread
> > > > > > > > > > > has
> > > > > > > > > > > >> 10000 segments needed to recover, and the other one
> > has
> > > 3
> > > > > > > segments
> > > > > > > > > > > needed
> > > > > > > > > > > >> to recover.
> > > > > > > > > > > >>
> > > > > > > > > > > >>   - kafka.log
> > > > > > > > > > > >>      - LogManager
> > > > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > > > >>            - /tmp/log1 => 5            ← there are 5
> > > logs
> > > > > > under
> > > > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > > > >>            - /tmp/log1                     ← 2
> threads
> > > are
> > > > > > doing
> > > > > > > > log
> > > > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > > > >>            - 0 => 10000         ← there are 10000
> > > segments
> > > > > > > needed
> > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > >>               recovered for thread 0
> > > > > > > > > > > >>               - 1 => 3
> > > > > > > > > > > >>               - /tmp/log2
> > > > > > > > > > > >>               - 0 => 0
> > > > > > > > > > > >>               - 1 => 0
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > > > > > >> It said, now, there are only 4 logs needed to
> recover
> > in
> > > > > > > > /tmp/log1,
> > > > > > > > > > and
> > > > > > > > > > > >> the thread 0 has 9000 segments left, and thread 1
> has
> > 5
> > > > > > segments
> > > > > > > > left
> > > > > > > > > > > >> (which should imply the thread already completed 2
> > logs
> > > > > > recovery
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > >> period)
> > > > > > > > > > > >>
> > > > > > > > > > > >>   - kafka.log
> > > > > > > > > > > >>      - LogManager
> > > > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > > > >>            - /tmp/log1 => 3            ← there are 3
> > > logs
> > > > > > under
> > > > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > > > >>            - /tmp/log1                     ← 2
> threads
> > > are
> > > > > > doing
> > > > > > > > log
> > > > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > > > >>            - 0 => 9000         ← there are 9000
> > segments
> > > > > > needed
> > > > > > > > to be
> > > > > > > > > > > >>               recovered for thread 0
> > > > > > > > > > > >>               - 1 => 5
> > > > > > > > > > > >>               - /tmp/log2
> > > > > > > > > > > >>               - 0 => 0
> > > > > > > > > > > >>               - 1 => 0
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> > > > > difficult
> > > > > > to
> > > > > > > > > > achieve
> > > > > > > > > > > >> as you expected. I think the current proposal with
> > > > > > > > > > > `RemainingLogsToRecover`
> > > > > > > > > > > >> and `RemainingSegmentsToRecover` should already
> > provide
> > > > > enough
> > > > > > > > info
> > > > > > > > > > for
> > > > > > > > > > > >> the log recovery progress.
> > > > > > > > > > > >>
> > > > > > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thank you.
> > > > > > > > > > > >> Luke
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > > > > > wushujames@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Hi Luke,
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Another thought: different topics can have
> different
> > > > > segment
> > > > > > > > sizes. I
> > > > > > > > > > > >>> don't know how common it is, but it is possible.
> Some
> > > > > topics
> > > > > > > > might
> > > > > > > > > > want
> > > > > > > > > > > >>> small segment sizes to more granular expiration of
> > > data.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > > > > > RemainingSegmentsToRecovery
> > > > > > > > > > > >>> is that the rate that they will decrement depends
> on
> > > the
> > > > > > > > > > configuration
> > > > > > > > > > > and
> > > > > > > > > > > >>> patterns of the topics and partitions and segment
> > > sizes.
> > > > If
> > > > > > > > someone
> > > > > > > > > > is
> > > > > > > > > > > >>> monitoring those metrics, they might see times
> where
> > > the
> > > > > > metric
> > > > > > > > > > > decrements
> > > > > > > > > > > >>> slowly, followed by a burst where it decrements
> > > quickly.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> What about RemainingBytesToRecovery? This would not
> > > > depend
> > > > > on
> > > > > > > the
> > > > > > > > > > > >>> configuration of the topic or of the data. It would
> > > > > actually
> > > > > > > be a
> > > > > > > > > > > pretty
> > > > > > > > > > > >>> good metric, because I think that this metric would
> > > > change
> > > > > > at a
> > > > > > > > > > > constant
> > > > > > > > > > > >>> rate (based on the disk I/O speed that the broker
> > > > allocates
> > > > > > to
> > > > > > > > > > > recovery).
> > > > > > > > > > > >>> Because it changes at a constant rate, you would be
> > > able
> > > > to
> > > > > > use
> > > > > > > > the
> > > > > > > > > > > >>> rate-of-change to predict when it hits zero, which
> > will
> > > > let
> > > > > > you
> > > > > > > > know
> > > > > > > > > > > when
> > > > > > > > > > > >>> the broker is going to start up. Like, I would
> > imagine
> > > if
> > > > > we
> > > > > > > > graphed
> > > > > > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly
> > > straight
> > > > > line
> > > > > > > > that is
> > > > > > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> What do you think about adding
> > > RemainingBytesToRecovery?
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Or, what would you think about making the primary
> > > metric
> > > > be
> > > > > > > > > > > >>> RemainingBytesToRecovery, and getting rid of the
> > > others?
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> I don't know if I personally would rather have all
> 3
> > > > > metrics,
> > > > > > > or
> > > > > > > > > > would
> > > > > > > > > > > >>> just use RemainingBytesToRecovery. I'd too would
> like
> > > > more
> > > > > > > > community
> > > > > > > > > > > input
> > > > > > > > > > > >>> on which of those metrics would be useful to
> people.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > > > > > >>> num.recovery.threads.per.data.dir=2, that there
> might
> > > be
> > > > a
> > > > > > > > separate
> > > > > > > > > > > >>> RemainingSegmentsToRecovery counter for each
> thread.
> > Is
> > > > > that
> > > > > > > > actually
> > > > > > > > > > > how
> > > > > > > > > > > >>> the data is structured within the Kafka recovery
> > > threads?
> > > > > > Does
> > > > > > > > each
> > > > > > > > > > > thread
> > > > > > > > > > > >>> get a fixed set of partitions, or is there just one
> > big
> > > > > pool
> > > > > > of
> > > > > > > > > > > partitions
> > > > > > > > > > > >>> that the threads all work on?
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> As a more concrete example:
> > > > > > > > > > > >>> * If I have 9 small partitions and 1 big partition,
> > and
> > > > > > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > > > > > >>> Does each thread get 5 partitions, which means one
> > > thread
> > > > > > will
> > > > > > > > finish
> > > > > > > > > > > >>> much sooner than the other?
> > > > > > > > > > > >>> OR
> > > > > > > > > > > >>> Do both threads just work on the set of 10
> > partitions,
> > > > > which
> > > > > > > > means
> > > > > > > > > > > likely
> > > > > > > > > > > >>> 1 thread will be busy with the big partition, while
> > the
> > > > > other
> > > > > > > one
> > > > > > > > > > ends
> > > > > > > > > > > up
> > > > > > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> If each thread gets assigned 5 partitions, then it
> > > would
> > > > > make
> > > > > > > > sense
> > > > > > > > > > > that
> > > > > > > > > > > >>> each thread has its own counter.
> > > > > > > > > > > >>> If the threads works on a single pool of 10
> > partitions,
> > > > > then
> > > > > > it
> > > > > > > > would
> > > > > > > > > > > >>> probably mean that the counter is on the pool of
> > > > partitions
> > > > > > > > itself,
> > > > > > > > > > > and not
> > > > > > > > > > > >>> on each thread.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> -James
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <
> > > > showuon@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> Hi devs,
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> If there are no other comments, I'll start a vote
> > > > > tomorrow.
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> Thank you.
> > > > > > > > > > > >>>> Luke
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> > > > > showuon@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>>> Hi James,
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>> Sorry for the late reply.
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>> Yes, this is a good point, to know how many
> > segments
> > > to
> > > > > be
> > > > > > > > > > recovered
> > > > > > > > > > > if
> > > > > > > > > > > >>>>> there are some large partitions.
> > > > > > > > > > > >>>>> I've updated the KIP, to add a
> > > > > > `*RemainingSegmentsToRecover*`
> > > > > > > > > > metric
> > > > > > > > > > > >>> for
> > > > > > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > > > > > >>>>> The example in the Proposed section here
> > > > > > > > > > > >>>>> <
> > > > > > > > > > > >>>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>>> shows what it will look like.
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>> Thank you.
> > > > > > > > > > > >>>>> Luke
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > > > > > wushujames@gmail.com>
> > > > > > > > > > > >>> wrote:
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which
> > > seems
> > > > > to
> > > > > > be
> > > > > > > > the
> > > > > > > > > > > >>> number
> > > > > > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>>>> We have some partitions which are much much
> larger
> > > > than
> > > > > > > > others.
> > > > > > > > > > > Those
> > > > > > > > > > > >>>>>> large partitions have many many more segments
> than
> > > > > others.
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>>>> Is there a way the metric can reflect partition
> > > size?
> > > > > > Could
> > > > > > > > it be
> > > > > > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > > > > > RemainingBytesToRecover?
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>>>> -James
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>>>> Sent from my iPhone
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > > > > > showuon@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >>>>>>>
> > > > > > > > > > > >>>>>>> Hi all,
> > > > > > > > > > > >>>>>>>
> > > > > > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric
> for
> > > log
> > > > > > > recovery
> > > > > > > > > > > >>> progress.
> > > > > > > > > > > >>>>>>> This metric would let the admins have a way to
> > > > monitor
> > > > > > the
> > > > > > > > log
> > > > > > > > > > > >>> recovery
> > > > > > > > > > > >>>>>>> progress.
> > > > > > > > > > > >>>>>>> Details can be found here:
> > > > > > > > > > > >>>>>>>
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > > > > > >>>>>>>
> > > > > > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > > > > > >>>>>>>
> > > > > > > > > > > >>>>>>> Thank you.
> > > > > > > > > > > >>>>>>> Luke
> > > > > > > > > > > >>>>>>
> > > > > > > > > > > >>>>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards,
> > > > > > > > Raman Verma
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Jun Rao <ju...@confluent.io.INVALID>.

Hi, Luke,

Thanks for the explanation.

10. It makes sense to me now. Instead of using a longer name, perhaps we
could keep the current name, but make the description clear that it's the
remaining segments for the current log assigned to a thread. Also, would it
be better to use ToRecover instead of ToRecovery?

Thanks,

Jun

On Fri, Jun 3, 2022 at 1:18 AM Luke Chen <sh...@gmail.com> wrote:

> Hi Jun,
>
> > how do we implement kafka.log
>
> :type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> which tracks at the segment level?
>
> It looks like the name is misleading.
> Suppose we have 2 log recovery threads
> (num.recovery.threads.per.data.dir=2),
> and 10 logs to iterate through
> As mentioned before, we don't know how many segments in each log until the
> log is iterated(loaded)
> So, when thread-1 iterates logA, it gets the segments to recover, and
> expose the number to `remainingSegmentsToRecovery` metric.
> And the same for thread-2 iterating logB.
> That is, the metric showed in `remainingSegmentsToRecovery` is actually the
> remaining segments to recover "in a specific log".
>
> Maybe I should rename it: remainingSegmentsToRecovery ->
> remainingSegmentsToRecoverInCurrentLog
> WDYT?
>
> Thank you.
> Luke
>
> On Fri, Jun 3, 2022 at 1:27 AM Jun Rao <ju...@confluent.io.invalid> wrote:
>
> > Hi, Luke,
> >
> > Thanks for the reply.
> >
> > 10. You are saying it's difficult to track the number of segments to
> > recover. But how do we
> > implement
> >
> kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > which tracks at the segment level?
> >
> > Jun
> >
> > On Thu, Jun 2, 2022 at 3:39 AM Luke Chen <sh...@gmail.com> wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the comment.
> > >
> > > Yes, I've tried to work on this way to track the number of remaining
> > > segments, but it will change the design in UnifiedLog, so I only track
> > the
> > > logs number.
> > > Currently, we will load all segments and recover those segments if
> needed
> > > "during creating UnifiedLog instance". And also get the log offsets
> here
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > > >
> > > .
> > > That is, if we want to get all segments to be recovered before running
> > log
> > > recovery, we need to break the logic in UnifiedLog, to create a partial
> > > UnifiedLog instance, and add more info to it later, which I think is
> just
> > > making the codes more complicated.
> > >
> > >
> > > Thank you.
> > > Luke
> > >
> > >
> > >
> > > On Thu, May 26, 2022 at 2:57 AM Jun Rao <ju...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, Luke,
> > > >
> > > > Thanks for the KIP. Just one comment.
> > > >
> > > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could
> > we
> > > > instead track the number of remaining segments? This monitors the
> > > progress
> > > > at a finer granularity and is also consistent with the thread level
> > > metric.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com>
> > wrote:
> > > >
> > > > > Thanks Luke! LGTM.
> > > > >
> > > > > On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com> wrote:
> > > > >
> > > > > > Hi Tom and Raman,
> > > > > >
> > > > > > Thanks for your comments.
> > > > > >
> > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > updating).
> > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > >
> > > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > > >
> > > > > > > 3. I wonder whether we need to keep these metrics (with value
> 0)
> > > once
> > > > > the
> > > > > > broker enters the running state. Do you see it as valuable? A
> > benefit
> > > > of
> > > > > > removing the metrics would be a reduction on storage required for
> > > > metric
> > > > > > stores which are recording these metrics.
> > > > > >
> > > > > > Yes, removing the metrics after log recovery completed is a good
> > > idea.
> > > > > > Updated the KIP.
> > > > > >
> > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > clearer.
> > > > > > Previous KIPs which added metrics usually used a table, with the
> > > MBean
> > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > KIP-748,
> > > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > > section
> > > > > > rather than describing the tree you'd see in an MBean console.
> > > > > >
> > > > > > Good point! Updated the KIP to use a table to list the MBean
> name,
> > > > metric
> > > > > > type and descriptions.
> > > > > >
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > > > <rverma@confluent.io.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Luke,
> > > > > > >
> > > > > > > The change is useful and simple. Thanks.
> > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Raman Verma
> > > > > > >
> > > > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <
> tbentley@redhat.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Luke,
> > > > > > > >
> > > > > > > > Thanks for the KIP. I think the idea makes sense and would
> > > provide
> > > > > > useful
> > > > > > > > observability of log recovery. I have a few comments.
> > > > > > > >
> > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > > updating).
> > > > > > > > 2. Similarly the link to this discussion thread needs
> updating.
> > > > > > > > 3. I wonder whether we need to keep these metrics (with value
> > 0)
> > > > once
> > > > > > the
> > > > > > > > broker enters the running state. Do you see it as valuable? A
> > > > benefit
> > > > > > of
> > > > > > > > removing the metrics would be a reduction on storage required
> > for
> > > > > > metric
> > > > > > > > stores which are recording these metrics.
> > > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > > clearer.
> > > > > > > > Previous KIPs which added metrics usually used a table, with
> > the
> > > > > MBean
> > > > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > > > KIP-748,
> > > > > > > > KIP-608). Similarly you could use a table in the proposed
> > changes
> > > > > > section
> > > > > > > > rather than describing the tree you'd see in an MBean
> console.
> > > > > > > >
> > > > > > > > Kind regards,
> > > > > > > >
> > > > > > > > Tom
> > > > > > > >
> > > > > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > > And if people start using RemainingLogs and
> > RemainingSegments
> > > > and
> > > > > > > then
> > > > > > > > > REALLY FEEL like they need RemainingBytes, then we can
> always
> > > add
> > > > > it
> > > > > > > in the
> > > > > > > > > future.
> > > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Thanks James!
> > > > > > > > > Luke
> > > > > > > > >
> > > > > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> > > > wushujames@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Luke,
> > > > > > > > > >
> > > > > > > > > > Thanks for the detailed explanation. I agree that the
> > current
> > > > > > > proposal of
> > > > > > > > > > RemainingLogs and RemainingSegments will greatly improve
> > the
> > > > > > > situation,
> > > > > > > > > and
> > > > > > > > > > that we can go ahead with the KIP as is.
> > > > > > > > > >
> > > > > > > > > > If RemainingBytes were straight-forward to implement,
> then
> > > I’d
> > > > > like
> > > > > > > to
> > > > > > > > > > have it. But we can live without it for now. And if
> people
> > > > start
> > > > > > > using
> > > > > > > > > > RemainingLogs and RemainingSegments and then REALLY FEEL
> > like
> > > > > they
> > > > > > > need
> > > > > > > > > > RemainingBytes, then we can always add it in the future.
> > > > > > > > > >
> > > > > > > > > > Thanks Luke, for the detailed explanation, and for
> > responding
> > > > to
> > > > > my
> > > > > > > > > > feedback!
> > > > > > > > > >
> > > > > > > > > > -James
> > > > > > > > > >
> > > > > > > > > > Sent from my iPhone
> > > > > > > > > >
> > > > > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <
> > showuon@gmail.com>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi James and all,
> > > > > > > > > > >
> > > > > > > > > > > I checked again and I can see when creating UnifiedLog,
> > we
> > > > > > > expected the
> > > > > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > > > > So, I don't think we should break the current design to
> > > > expose
> > > > > > the
> > > > > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > > > > metric.
> > > > > > > > > > >
> > > > > > > > > > > If there is no other comments, I'll start a vote within
> > > this
> > > > > > week.
> > > > > > > > > > >
> > > > > > > > > > > Thank you.
> > > > > > > > > > > Luke
> > > > > > > > > > >
> > > > > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <
> > > showuon@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> Hi James,
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks for your input.
> > > > > > > > > > >>
> > > > > > > > > > >> For the `RemainingBytesToRecovery` metric proposal, I
> > > think
> > > > > > > there's
> > > > > > > > > one
> > > > > > > > > > >> thing I didn't make it clear.
> > > > > > > > > > >> Currently, when log manager start up, we'll try to
> load
> > > all
> > > > > logs
> > > > > > > > > > >> (segments), and during the log loading, we'll try to
> > > recover
> > > > > > logs
> > > > > > > if
> > > > > > > > > > >> necessary.
> > > > > > > > > > >> And the logs loading is using "thread pool" as you
> > > thought.
> > > > > > > > > > >>
> > > > > > > > > > >> So, here's the problem:
> > > > > > > > > > >> All segments in each log folder (partition) will be
> > loaded
> > > > in
> > > > > > > each log
> > > > > > > > > > >> recovery thread, and until it's loaded, we can know
> how
> > > many
> > > > > > > segments
> > > > > > > > > > (or
> > > > > > > > > > >> how many Bytes) needed to recover.
> > > > > > > > > > >> That means, if we have 10 partition logs in one
> broker,
> > > and
> > > > we
> > > > > > > have 2
> > > > > > > > > > log
> > > > > > > > > > >> recovery threads
> (num.recovery.threads.per.data.dir=2),
> > > > before
> > > > > > the
> > > > > > > > > > >> threads load the segments in each log, we only know
> how
> > > many
> > > > > > logs
> > > > > > > > > > >> (partitions) we have in the broker (i.e.
> > > > > RemainingLogsToRecover
> > > > > > > > > metric).
> > > > > > > > > > >> We cannot know how many segments/Bytes needed to
> recover
> > > > until
> > > > > > > each
> > > > > > > > > > thread
> > > > > > > > > > >> starts to load the segments under one log (partition).
> > > > > > > > > > >>
> > > > > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > > > > >> Currently, there are still 5 logs (partitions) needed
> to
> > > > > recover
> > > > > > > under
> > > > > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs,
> > > where
> > > > > one
> > > > > > > > > thread
> > > > > > > > > > has
> > > > > > > > > > >> 10000 segments needed to recover, and the other one
> has
> > 3
> > > > > > segments
> > > > > > > > > > needed
> > > > > > > > > > >> to recover.
> > > > > > > > > > >>
> > > > > > > > > > >>   - kafka.log
> > > > > > > > > > >>      - LogManager
> > > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > > >>            - /tmp/log1 => 5            ← there are 5
> > logs
> > > > > under
> > > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > > >>            - /tmp/log1                     ← 2 threads
> > are
> > > > > doing
> > > > > > > log
> > > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > > >>            - 0 => 10000         ← there are 10000
> > segments
> > > > > > needed
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > >>               recovered for thread 0
> > > > > > > > > > >>               - 1 => 3
> > > > > > > > > > >>               - /tmp/log2
> > > > > > > > > > >>               - 0 => 0
> > > > > > > > > > >>               - 1 => 0
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > > > > >> It said, now, there are only 4 logs needed to recover
> in
> > > > > > > /tmp/log1,
> > > > > > > > > and
> > > > > > > > > > >> the thread 0 has 9000 segments left, and thread 1 has
> 5
> > > > > segments
> > > > > > > left
> > > > > > > > > > >> (which should imply the thread already completed 2
> logs
> > > > > recovery
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > >> period)
> > > > > > > > > > >>
> > > > > > > > > > >>   - kafka.log
> > > > > > > > > > >>      - LogManager
> > > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > > >>            - /tmp/log1 => 3            ← there are 3
> > logs
> > > > > under
> > > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > > >>            - /tmp/log1                     ← 2 threads
> > are
> > > > > doing
> > > > > > > log
> > > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > > >>            - 0 => 9000         ← there are 9000
> segments
> > > > > needed
> > > > > > > to be
> > > > > > > > > > >>               recovered for thread 0
> > > > > > > > > > >>               - 1 => 5
> > > > > > > > > > >>               - /tmp/log2
> > > > > > > > > > >>               - 0 => 0
> > > > > > > > > > >>               - 1 => 0
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> > > > difficult
> > > > > to
> > > > > > > > > achieve
> > > > > > > > > > >> as you expected. I think the current proposal with
> > > > > > > > > > `RemainingLogsToRecover`
> > > > > > > > > > >> and `RemainingSegmentsToRecover` should already
> provide
> > > > enough
> > > > > > > info
> > > > > > > > > for
> > > > > > > > > > >> the log recovery progress.
> > > > > > > > > > >>
> > > > > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> Thank you.
> > > > > > > > > > >> Luke
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > > > > wushujames@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>>
> > > > > > > > > > >>> Hi Luke,
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Another thought: different topics can have different
> > > > segment
> > > > > > > sizes. I
> > > > > > > > > > >>> don't know how common it is, but it is possible. Some
> > > > topics
> > > > > > > might
> > > > > > > > > want
> > > > > > > > > > >>> small segment sizes to more granular expiration of
> > data.
> > > > > > > > > > >>>
> > > > > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > > > > RemainingSegmentsToRecovery
> > > > > > > > > > >>> is that the rate that they will decrement depends on
> > the
> > > > > > > > > configuration
> > > > > > > > > > and
> > > > > > > > > > >>> patterns of the topics and partitions and segment
> > sizes.
> > > If
> > > > > > > someone
> > > > > > > > > is
> > > > > > > > > > >>> monitoring those metrics, they might see times where
> > the
> > > > > metric
> > > > > > > > > > decrements
> > > > > > > > > > >>> slowly, followed by a burst where it decrements
> > quickly.
> > > > > > > > > > >>>
> > > > > > > > > > >>> What about RemainingBytesToRecovery? This would not
> > > depend
> > > > on
> > > > > > the
> > > > > > > > > > >>> configuration of the topic or of the data. It would
> > > > actually
> > > > > > be a
> > > > > > > > > > pretty
> > > > > > > > > > >>> good metric, because I think that this metric would
> > > change
> > > > > at a
> > > > > > > > > > constant
> > > > > > > > > > >>> rate (based on the disk I/O speed that the broker
> > > allocates
> > > > > to
> > > > > > > > > > recovery).
> > > > > > > > > > >>> Because it changes at a constant rate, you would be
> > able
> > > to
> > > > > use
> > > > > > > the
> > > > > > > > > > >>> rate-of-change to predict when it hits zero, which
> will
> > > let
> > > > > you
> > > > > > > know
> > > > > > > > > > when
> > > > > > > > > > >>> the broker is going to start up. Like, I would
> imagine
> > if
> > > > we
> > > > > > > graphed
> > > > > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly
> > straight
> > > > line
> > > > > > > that is
> > > > > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > > > > >>>
> > > > > > > > > > >>> What do you think about adding
> > RemainingBytesToRecovery?
> > > > > > > > > > >>>
> > > > > > > > > > >>> Or, what would you think about making the primary
> > metric
> > > be
> > > > > > > > > > >>> RemainingBytesToRecovery, and getting rid of the
> > others?
> > > > > > > > > > >>>
> > > > > > > > > > >>> I don't know if I personally would rather have all 3
> > > > metrics,
> > > > > > or
> > > > > > > > > would
> > > > > > > > > > >>> just use RemainingBytesToRecovery. I'd too would like
> > > more
> > > > > > > community
> > > > > > > > > > input
> > > > > > > > > > >>> on which of those metrics would be useful to people.
> > > > > > > > > > >>>
> > > > > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > > > > >>> num.recovery.threads.per.data.dir=2, that there might
> > be
> > > a
> > > > > > > separate
> > > > > > > > > > >>> RemainingSegmentsToRecovery counter for each thread.
> Is
> > > > that
> > > > > > > actually
> > > > > > > > > > how
> > > > > > > > > > >>> the data is structured within the Kafka recovery
> > threads?
> > > > > Does
> > > > > > > each
> > > > > > > > > > thread
> > > > > > > > > > >>> get a fixed set of partitions, or is there just one
> big
> > > > pool
> > > > > of
> > > > > > > > > > partitions
> > > > > > > > > > >>> that the threads all work on?
> > > > > > > > > > >>>
> > > > > > > > > > >>> As a more concrete example:
> > > > > > > > > > >>> * If I have 9 small partitions and 1 big partition,
> and
> > > > > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > > > > >>> Does each thread get 5 partitions, which means one
> > thread
> > > > > will
> > > > > > > finish
> > > > > > > > > > >>> much sooner than the other?
> > > > > > > > > > >>> OR
> > > > > > > > > > >>> Do both threads just work on the set of 10
> partitions,
> > > > which
> > > > > > > means
> > > > > > > > > > likely
> > > > > > > > > > >>> 1 thread will be busy with the big partition, while
> the
> > > > other
> > > > > > one
> > > > > > > > > ends
> > > > > > > > > > up
> > > > > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > > > > >>>
> > > > > > > > > > >>> If each thread gets assigned 5 partitions, then it
> > would
> > > > make
> > > > > > > sense
> > > > > > > > > > that
> > > > > > > > > > >>> each thread has its own counter.
> > > > > > > > > > >>> If the threads works on a single pool of 10
> partitions,
> > > > then
> > > > > it
> > > > > > > would
> > > > > > > > > > >>> probably mean that the counter is on the pool of
> > > partitions
> > > > > > > itself,
> > > > > > > > > > and not
> > > > > > > > > > >>> on each thread.
> > > > > > > > > > >>>
> > > > > > > > > > >>> -James
> > > > > > > > > > >>>
> > > > > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <
> > > showuon@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> Hi devs,
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> If there are no other comments, I'll start a vote
> > > > tomorrow.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> Thank you.
> > > > > > > > > > >>>> Luke
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> > > > showuon@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>> Hi James,
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> Sorry for the late reply.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> Yes, this is a good point, to know how many
> segments
> > to
> > > > be
> > > > > > > > > recovered
> > > > > > > > > > if
> > > > > > > > > > >>>>> there are some large partitions.
> > > > > > > > > > >>>>> I've updated the KIP, to add a
> > > > > `*RemainingSegmentsToRecover*`
> > > > > > > > > metric
> > > > > > > > > > >>> for
> > > > > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > > > > >>>>> The example in the Proposed section here
> > > > > > > > > > >>>>> <
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>> shows what it will look like.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> Thank you.
> > > > > > > > > > >>>>> Luke
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > > > > wushujames@gmail.com>
> > > > > > > > > > >>> wrote:
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which
> > seems
> > > > to
> > > > > be
> > > > > > > the
> > > > > > > > > > >>> number
> > > > > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> We have some partitions which are much much larger
> > > than
> > > > > > > others.
> > > > > > > > > > Those
> > > > > > > > > > >>>>>> large partitions have many many more segments than
> > > > others.
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> Is there a way the metric can reflect partition
> > size?
> > > > > Could
> > > > > > > it be
> > > > > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > > > > RemainingBytesToRecover?
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> -James
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> Sent from my iPhone
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > > > > showuon@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> Hi all,
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric for
> > log
> > > > > > recovery
> > > > > > > > > > >>> progress.
> > > > > > > > > > >>>>>>> This metric would let the admins have a way to
> > > monitor
> > > > > the
> > > > > > > log
> > > > > > > > > > >>> recovery
> > > > > > > > > > >>>>>>> progress.
> > > > > > > > > > >>>>>>> Details can be found here:
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> Thank you.
> > > > > > > > > > >>>>>>> Luke
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards,
> > > > > > > Raman Verma
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi Jun,

> how do we implement kafka.log
:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
which tracks at the segment level?

It looks like the name is misleading.
Suppose we have 2 log recovery threads (num.recovery.threads.per.data.dir=2),
and 10 logs to iterate through
As mentioned before, we don't know how many segments in each log until the
log is iterated(loaded)
So, when thread-1 iterates logA, it gets the segments to recover, and
expose the number to `remainingSegmentsToRecovery` metric.
And the same for thread-2 iterating logB.
That is, the metric showed in `remainingSegmentsToRecovery` is actually the
remaining segments to recover "in a specific log".

Maybe I should rename it: remainingSegmentsToRecovery ->
remainingSegmentsToRecoverInCurrentLog
WDYT?

Thank you.
Luke

On Fri, Jun 3, 2022 at 1:27 AM Jun Rao <ju...@confluent.io.invalid> wrote:

> Hi, Luke,
>
> Thanks for the reply.
>
> 10. You are saying it's difficult to track the number of segments to
> recover. But how do we
> implement
> kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> which tracks at the segment level?
>
> Jun
>
> On Thu, Jun 2, 2022 at 3:39 AM Luke Chen <sh...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > Thanks for the comment.
> >
> > Yes, I've tried to work on this way to track the number of remaining
> > segments, but it will change the design in UnifiedLog, so I only track
> the
> > logs number.
> > Currently, we will load all segments and recover those segments if needed
> > "during creating UnifiedLog instance". And also get the log offsets here
> > <
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > >
> > .
> > That is, if we want to get all segments to be recovered before running
> log
> > recovery, we need to break the logic in UnifiedLog, to create a partial
> > UnifiedLog instance, and add more info to it later, which I think is just
> > making the codes more complicated.
> >
> >
> > Thank you.
> > Luke
> >
> >
> >
> > On Thu, May 26, 2022 at 2:57 AM Jun Rao <ju...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Luke,
> > >
> > > Thanks for the KIP. Just one comment.
> > >
> > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could
> we
> > > instead track the number of remaining segments? This monitors the
> > progress
> > > at a finer granularity and is also consistent with the thread level
> > metric.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com>
> wrote:
> > >
> > > > Thanks Luke! LGTM.
> > > >
> > > > On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > Hi Tom and Raman,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > updating).
> > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > Please update the links to JIRA and the discussion thread.
> > > > >
> > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > >
> > > > > > 3. I wonder whether we need to keep these metrics (with value 0)
> > once
> > > > the
> > > > > broker enters the running state. Do you see it as valuable? A
> benefit
> > > of
> > > > > removing the metrics would be a reduction on storage required for
> > > metric
> > > > > stores which are recording these metrics.
> > > > >
> > > > > Yes, removing the metrics after log recovery completed is a good
> > idea.
> > > > > Updated the KIP.
> > > > >
> > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > clearer.
> > > > > Previous KIPs which added metrics usually used a table, with the
> > MBean
> > > > > name, metric type and description. SeeKIP-551 for example (or
> > KIP-748,
> > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > section
> > > > > rather than describing the tree you'd see in an MBean console.
> > > > >
> > > > > Good point! Updated the KIP to use a table to list the MBean name,
> > > metric
> > > > > type and descriptions.
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > > <rverma@confluent.io.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Luke,
> > > > > >
> > > > > > The change is useful and simple. Thanks.
> > > > > > Please update the links to JIRA and the discussion thread.
> > > > > >
> > > > > > Best Regards,
> > > > > > Raman Verma
> > > > > >
> > > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tbentley@redhat.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Luke,
> > > > > > >
> > > > > > > Thanks for the KIP. I think the idea makes sense and would
> > provide
> > > > > useful
> > > > > > > observability of log recovery. I have a few comments.
> > > > > > >
> > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > updating).
> > > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > 3. I wonder whether we need to keep these metrics (with value
> 0)
> > > once
> > > > > the
> > > > > > > broker enters the running state. Do you see it as valuable? A
> > > benefit
> > > > > of
> > > > > > > removing the metrics would be a reduction on storage required
> for
> > > > > metric
> > > > > > > stores which are recording these metrics.
> > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > clearer.
> > > > > > > Previous KIPs which added metrics usually used a table, with
> the
> > > > MBean
> > > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > > KIP-748,
> > > > > > > KIP-608). Similarly you could use a table in the proposed
> changes
> > > > > section
> > > > > > > rather than describing the tree you'd see in an MBean console.
> > > > > > >
> > > > > > > Kind regards,
> > > > > > >
> > > > > > > Tom
> > > > > > >
> > > > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > > And if people start using RemainingLogs and
> RemainingSegments
> > > and
> > > > > > then
> > > > > > > > REALLY FEEL like they need RemainingBytes, then we can always
> > add
> > > > it
> > > > > > in the
> > > > > > > > future.
> > > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Thanks James!
> > > > > > > > Luke
> > > > > > > >
> > > > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> > > wushujames@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Luke,
> > > > > > > > >
> > > > > > > > > Thanks for the detailed explanation. I agree that the
> current
> > > > > > proposal of
> > > > > > > > > RemainingLogs and RemainingSegments will greatly improve
> the
> > > > > > situation,
> > > > > > > > and
> > > > > > > > > that we can go ahead with the KIP as is.
> > > > > > > > >
> > > > > > > > > If RemainingBytes were straight-forward to implement, then
> > I’d
> > > > like
> > > > > > to
> > > > > > > > > have it. But we can live without it for now. And if people
> > > start
> > > > > > using
> > > > > > > > > RemainingLogs and RemainingSegments and then REALLY FEEL
> like
> > > > they
> > > > > > need
> > > > > > > > > RemainingBytes, then we can always add it in the future.
> > > > > > > > >
> > > > > > > > > Thanks Luke, for the detailed explanation, and for
> responding
> > > to
> > > > my
> > > > > > > > > feedback!
> > > > > > > > >
> > > > > > > > > -James
> > > > > > > > >
> > > > > > > > > Sent from my iPhone
> > > > > > > > >
> > > > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <
> showuon@gmail.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi James and all,
> > > > > > > > > >
> > > > > > > > > > I checked again and I can see when creating UnifiedLog,
> we
> > > > > > expected the
> > > > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > > > So, I don't think we should break the current design to
> > > expose
> > > > > the
> > > > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > > > metric.
> > > > > > > > > >
> > > > > > > > > > If there is no other comments, I'll start a vote within
> > this
> > > > > week.
> > > > > > > > > >
> > > > > > > > > > Thank you.
> > > > > > > > > > Luke
> > > > > > > > > >
> > > > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <
> > showuon@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> Hi James,
> > > > > > > > > >>
> > > > > > > > > >> Thanks for your input.
> > > > > > > > > >>
> > > > > > > > > >> For the `RemainingBytesToRecovery` metric proposal, I
> > think
> > > > > > there's
> > > > > > > > one
> > > > > > > > > >> thing I didn't make it clear.
> > > > > > > > > >> Currently, when log manager start up, we'll try to load
> > all
> > > > logs
> > > > > > > > > >> (segments), and during the log loading, we'll try to
> > recover
> > > > > logs
> > > > > > if
> > > > > > > > > >> necessary.
> > > > > > > > > >> And the logs loading is using "thread pool" as you
> > thought.
> > > > > > > > > >>
> > > > > > > > > >> So, here's the problem:
> > > > > > > > > >> All segments in each log folder (partition) will be
> loaded
> > > in
> > > > > > each log
> > > > > > > > > >> recovery thread, and until it's loaded, we can know how
> > many
> > > > > > segments
> > > > > > > > > (or
> > > > > > > > > >> how many Bytes) needed to recover.
> > > > > > > > > >> That means, if we have 10 partition logs in one broker,
> > and
> > > we
> > > > > > have 2
> > > > > > > > > log
> > > > > > > > > >> recovery threads (num.recovery.threads.per.data.dir=2),
> > > before
> > > > > the
> > > > > > > > > >> threads load the segments in each log, we only know how
> > many
> > > > > logs
> > > > > > > > > >> (partitions) we have in the broker (i.e.
> > > > RemainingLogsToRecover
> > > > > > > > metric).
> > > > > > > > > >> We cannot know how many segments/Bytes needed to recover
> > > until
> > > > > > each
> > > > > > > > > thread
> > > > > > > > > >> starts to load the segments under one log (partition).
> > > > > > > > > >>
> > > > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > > > >> Currently, there are still 5 logs (partitions) needed to
> > > > recover
> > > > > > under
> > > > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs,
> > where
> > > > one
> > > > > > > > thread
> > > > > > > > > has
> > > > > > > > > >> 10000 segments needed to recover, and the other one has
> 3
> > > > > segments
> > > > > > > > > needed
> > > > > > > > > >> to recover.
> > > > > > > > > >>
> > > > > > > > > >>   - kafka.log
> > > > > > > > > >>      - LogManager
> > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > >>            - /tmp/log1 => 5            ← there are 5
> logs
> > > > under
> > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > >>            - /tmp/log1                     ← 2 threads
> are
> > > > doing
> > > > > > log
> > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > >>            - 0 => 10000         ← there are 10000
> segments
> > > > > needed
> > > > > > to
> > > > > > > > be
> > > > > > > > > >>               recovered for thread 0
> > > > > > > > > >>               - 1 => 3
> > > > > > > > > >>               - /tmp/log2
> > > > > > > > > >>               - 0 => 0
> > > > > > > > > >>               - 1 => 0
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > > > >> It said, now, there are only 4 logs needed to recover in
> > > > > > /tmp/log1,
> > > > > > > > and
> > > > > > > > > >> the thread 0 has 9000 segments left, and thread 1 has 5
> > > > segments
> > > > > > left
> > > > > > > > > >> (which should imply the thread already completed 2 logs
> > > > recovery
> > > > > > in
> > > > > > > > the
> > > > > > > > > >> period)
> > > > > > > > > >>
> > > > > > > > > >>   - kafka.log
> > > > > > > > > >>      - LogManager
> > > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > > >>            - /tmp/log1 => 3            ← there are 3
> logs
> > > > under
> > > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > > >>            - /tmp/log1                     ← 2 threads
> are
> > > > doing
> > > > > > log
> > > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > > >>            - 0 => 9000         ← there are 9000 segments
> > > > needed
> > > > > > to be
> > > > > > > > > >>               recovered for thread 0
> > > > > > > > > >>               - 1 => 5
> > > > > > > > > >>               - /tmp/log2
> > > > > > > > > >>               - 0 => 0
> > > > > > > > > >>               - 1 => 0
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> > > difficult
> > > > to
> > > > > > > > achieve
> > > > > > > > > >> as you expected. I think the current proposal with
> > > > > > > > > `RemainingLogsToRecover`
> > > > > > > > > >> and `RemainingSegmentsToRecover` should already provide
> > > enough
> > > > > > info
> > > > > > > > for
> > > > > > > > > >> the log recovery progress.
> > > > > > > > > >>
> > > > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Thank you.
> > > > > > > > > >> Luke
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > > > wushujames@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> Hi Luke,
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > > > >>>
> > > > > > > > > >>> Another thought: different topics can have different
> > > segment
> > > > > > sizes. I
> > > > > > > > > >>> don't know how common it is, but it is possible. Some
> > > topics
> > > > > > might
> > > > > > > > want
> > > > > > > > > >>> small segment sizes to more granular expiration of
> data.
> > > > > > > > > >>>
> > > > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > > > RemainingSegmentsToRecovery
> > > > > > > > > >>> is that the rate that they will decrement depends on
> the
> > > > > > > > configuration
> > > > > > > > > and
> > > > > > > > > >>> patterns of the topics and partitions and segment
> sizes.
> > If
> > > > > > someone
> > > > > > > > is
> > > > > > > > > >>> monitoring those metrics, they might see times where
> the
> > > > metric
> > > > > > > > > decrements
> > > > > > > > > >>> slowly, followed by a burst where it decrements
> quickly.
> > > > > > > > > >>>
> > > > > > > > > >>> What about RemainingBytesToRecovery? This would not
> > depend
> > > on
> > > > > the
> > > > > > > > > >>> configuration of the topic or of the data. It would
> > > actually
> > > > > be a
> > > > > > > > > pretty
> > > > > > > > > >>> good metric, because I think that this metric would
> > change
> > > > at a
> > > > > > > > > constant
> > > > > > > > > >>> rate (based on the disk I/O speed that the broker
> > allocates
> > > > to
> > > > > > > > > recovery).
> > > > > > > > > >>> Because it changes at a constant rate, you would be
> able
> > to
> > > > use
> > > > > > the
> > > > > > > > > >>> rate-of-change to predict when it hits zero, which will
> > let
> > > > you
> > > > > > know
> > > > > > > > > when
> > > > > > > > > >>> the broker is going to start up. Like, I would imagine
> if
> > > we
> > > > > > graphed
> > > > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly
> straight
> > > line
> > > > > > that is
> > > > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > > > >>>
> > > > > > > > > >>> What do you think about adding
> RemainingBytesToRecovery?
> > > > > > > > > >>>
> > > > > > > > > >>> Or, what would you think about making the primary
> metric
> > be
> > > > > > > > > >>> RemainingBytesToRecovery, and getting rid of the
> others?
> > > > > > > > > >>>
> > > > > > > > > >>> I don't know if I personally would rather have all 3
> > > metrics,
> > > > > or
> > > > > > > > would
> > > > > > > > > >>> just use RemainingBytesToRecovery. I'd too would like
> > more
> > > > > > community
> > > > > > > > > input
> > > > > > > > > >>> on which of those metrics would be useful to people.
> > > > > > > > > >>>
> > > > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > > > >>> num.recovery.threads.per.data.dir=2, that there might
> be
> > a
> > > > > > separate
> > > > > > > > > >>> RemainingSegmentsToRecovery counter for each thread. Is
> > > that
> > > > > > actually
> > > > > > > > > how
> > > > > > > > > >>> the data is structured within the Kafka recovery
> threads?
> > > > Does
> > > > > > each
> > > > > > > > > thread
> > > > > > > > > >>> get a fixed set of partitions, or is there just one big
> > > pool
> > > > of
> > > > > > > > > partitions
> > > > > > > > > >>> that the threads all work on?
> > > > > > > > > >>>
> > > > > > > > > >>> As a more concrete example:
> > > > > > > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > > > >>> Does each thread get 5 partitions, which means one
> thread
> > > > will
> > > > > > finish
> > > > > > > > > >>> much sooner than the other?
> > > > > > > > > >>> OR
> > > > > > > > > >>> Do both threads just work on the set of 10 partitions,
> > > which
> > > > > > means
> > > > > > > > > likely
> > > > > > > > > >>> 1 thread will be busy with the big partition, while the
> > > other
> > > > > one
> > > > > > > > ends
> > > > > > > > > up
> > > > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > > > >>>
> > > > > > > > > >>> If each thread gets assigned 5 partitions, then it
> would
> > > make
> > > > > > sense
> > > > > > > > > that
> > > > > > > > > >>> each thread has its own counter.
> > > > > > > > > >>> If the threads works on a single pool of 10 partitions,
> > > then
> > > > it
> > > > > > would
> > > > > > > > > >>> probably mean that the counter is on the pool of
> > partitions
> > > > > > itself,
> > > > > > > > > and not
> > > > > > > > > >>> on each thread.
> > > > > > > > > >>>
> > > > > > > > > >>> -James
> > > > > > > > > >>>
> > > > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <
> > showuon@gmail.com>
> > > > > > wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>> Hi devs,
> > > > > > > > > >>>>
> > > > > > > > > >>>> If there are no other comments, I'll start a vote
> > > tomorrow.
> > > > > > > > > >>>>
> > > > > > > > > >>>> Thank you.
> > > > > > > > > >>>> Luke
> > > > > > > > > >>>>
> > > > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> > > showuon@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>>> Hi James,
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Sorry for the late reply.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Yes, this is a good point, to know how many segments
> to
> > > be
> > > > > > > > recovered
> > > > > > > > > if
> > > > > > > > > >>>>> there are some large partitions.
> > > > > > > > > >>>>> I've updated the KIP, to add a
> > > > `*RemainingSegmentsToRecover*`
> > > > > > > > metric
> > > > > > > > > >>> for
> > > > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > > > >>>>> The example in the Proposed section here
> > > > > > > > > >>>>> <
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > > > >>>>
> > > > > > > > > >>>>> shows what it will look like.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Thank you.
> > > > > > > > > >>>>> Luke
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > > > wushujames@gmail.com>
> > > > > > > > > >>> wrote:
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which
> seems
> > > to
> > > > be
> > > > > > the
> > > > > > > > > >>> number
> > > > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> We have some partitions which are much much larger
> > than
> > > > > > others.
> > > > > > > > > Those
> > > > > > > > > >>>>>> large partitions have many many more segments than
> > > others.
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> Is there a way the metric can reflect partition
> size?
> > > > Could
> > > > > > it be
> > > > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > > > RemainingBytesToRecover?
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> -James
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> Sent from my iPhone
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > > > showuon@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >>>>>>>
> > > > > > > > > >>>>>>> Hi all,
> > > > > > > > > >>>>>>>
> > > > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric for
> log
> > > > > recovery
> > > > > > > > > >>> progress.
> > > > > > > > > >>>>>>> This metric would let the admins have a way to
> > monitor
> > > > the
> > > > > > log
> > > > > > > > > >>> recovery
> > > > > > > > > >>>>>>> progress.
> > > > > > > > > >>>>>>> Details can be found here:
> > > > > > > > > >>>>>>>
> > > > > > > > > >>>>>>
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > > > >>>>>>>
> > > > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > > > >>>>>>>
> > > > > > > > > >>>>>>> Thank you.
> > > > > > > > > >>>>>>> Luke
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards,
> > > > > > Raman Verma
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Jun Rao <ju...@confluent.io.INVALID>.

Hi, Luke,

Thanks for the reply.

10. You are saying it's difficult to track the number of segments to
recover. But how do we
implement kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
which tracks at the segment level?

Jun

On Thu, Jun 2, 2022 at 3:39 AM Luke Chen <sh...@gmail.com> wrote:

> Hi Jun,
>
> Thanks for the comment.
>
> Yes, I've tried to work on this way to track the number of remaining
> segments, but it will change the design in UnifiedLog, so I only track the
> logs number.
> Currently, we will load all segments and recover those segments if needed
> "during creating UnifiedLog instance". And also get the log offsets here
> <
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> >
> .
> That is, if we want to get all segments to be recovered before running log
> recovery, we need to break the logic in UnifiedLog, to create a partial
> UnifiedLog instance, and add more info to it later, which I think is just
> making the codes more complicated.
>
>
> Thank you.
> Luke
>
>
>
> On Thu, May 26, 2022 at 2:57 AM Jun Rao <ju...@confluent.io.invalid> wrote:
>
> > Hi, Luke,
> >
> > Thanks for the KIP. Just one comment.
> >
> > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could we
> > instead track the number of remaining segments? This monitors the
> progress
> > at a finer granularity and is also consistent with the thread level
> metric.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > Thanks Luke! LGTM.
> > >
> > > On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com> wrote:
> > >
> > > > Hi Tom and Raman,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> updating).
> > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > Please update the links to JIRA and the discussion thread.
> > > >
> > > > Yes, thanks for the reminder. I've updated the KIP.
> > > >
> > > > > 3. I wonder whether we need to keep these metrics (with value 0)
> once
> > > the
> > > > broker enters the running state. Do you see it as valuable? A benefit
> > of
> > > > removing the metrics would be a reduction on storage required for
> > metric
> > > > stores which are recording these metrics.
> > > >
> > > > Yes, removing the metrics after log recovery completed is a good
> idea.
> > > > Updated the KIP.
> > > >
> > > > > 4. I think the KIP's public interfaces section could be a bit
> > clearer.
> > > > Previous KIPs which added metrics usually used a table, with the
> MBean
> > > > name, metric type and description. SeeKIP-551 for example (or
> KIP-748,
> > > > KIP-608). Similarly you could use a table in the proposed changes
> > section
> > > > rather than describing the tree you'd see in an MBean console.
> > > >
> > > > Good point! Updated the KIP to use a table to list the MBean name,
> > metric
> > > > type and descriptions.
> > > >
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > <rverma@confluent.io.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi Luke,
> > > > >
> > > > > The change is useful and simple. Thanks.
> > > > > Please update the links to JIRA and the discussion thread.
> > > > >
> > > > > Best Regards,
> > > > > Raman Verma
> > > > >
> > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tb...@redhat.com>
> > > wrote:
> > > > > >
> > > > > > Hi Luke,
> > > > > >
> > > > > > Thanks for the KIP. I think the idea makes sense and would
> provide
> > > > useful
> > > > > > observability of log recovery. I have a few comments.
> > > > > >
> > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > updating).
> > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > 3. I wonder whether we need to keep these metrics (with value 0)
> > once
> > > > the
> > > > > > broker enters the running state. Do you see it as valuable? A
> > benefit
> > > > of
> > > > > > removing the metrics would be a reduction on storage required for
> > > > metric
> > > > > > stores which are recording these metrics.
> > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > clearer.
> > > > > > Previous KIPs which added metrics usually used a table, with the
> > > MBean
> > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > KIP-748,
> > > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > > section
> > > > > > rather than describing the tree you'd see in an MBean console.
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com>
> wrote:
> > > > > >
> > > > > > > > And if people start using RemainingLogs and RemainingSegments
> > and
> > > > > then
> > > > > > > REALLY FEEL like they need RemainingBytes, then we can always
> add
> > > it
> > > > > in the
> > > > > > > future.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Thanks James!
> > > > > > > Luke
> > > > > > >
> > > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> > wushujames@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Luke,
> > > > > > > >
> > > > > > > > Thanks for the detailed explanation. I agree that the current
> > > > > proposal of
> > > > > > > > RemainingLogs and RemainingSegments will greatly improve the
> > > > > situation,
> > > > > > > and
> > > > > > > > that we can go ahead with the KIP as is.
> > > > > > > >
> > > > > > > > If RemainingBytes were straight-forward to implement, then
> I’d
> > > like
> > > > > to
> > > > > > > > have it. But we can live without it for now. And if people
> > start
> > > > > using
> > > > > > > > RemainingLogs and RemainingSegments and then REALLY FEEL like
> > > they
> > > > > need
> > > > > > > > RemainingBytes, then we can always add it in the future.
> > > > > > > >
> > > > > > > > Thanks Luke, for the detailed explanation, and for responding
> > to
> > > my
> > > > > > > > feedback!
> > > > > > > >
> > > > > > > > -James
> > > > > > > >
> > > > > > > > Sent from my iPhone
> > > > > > > >
> > > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi James and all,
> > > > > > > > >
> > > > > > > > > I checked again and I can see when creating UnifiedLog, we
> > > > > expected the
> > > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > > So, I don't think we should break the current design to
> > expose
> > > > the
> > > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > > metric.
> > > > > > > > >
> > > > > > > > > If there is no other comments, I'll start a vote within
> this
> > > > week.
> > > > > > > > >
> > > > > > > > > Thank you.
> > > > > > > > > Luke
> > > > > > > > >
> > > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <
> showuon@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> Hi James,
> > > > > > > > >>
> > > > > > > > >> Thanks for your input.
> > > > > > > > >>
> > > > > > > > >> For the `RemainingBytesToRecovery` metric proposal, I
> think
> > > > > there's
> > > > > > > one
> > > > > > > > >> thing I didn't make it clear.
> > > > > > > > >> Currently, when log manager start up, we'll try to load
> all
> > > logs
> > > > > > > > >> (segments), and during the log loading, we'll try to
> recover
> > > > logs
> > > > > if
> > > > > > > > >> necessary.
> > > > > > > > >> And the logs loading is using "thread pool" as you
> thought.
> > > > > > > > >>
> > > > > > > > >> So, here's the problem:
> > > > > > > > >> All segments in each log folder (partition) will be loaded
> > in
> > > > > each log
> > > > > > > > >> recovery thread, and until it's loaded, we can know how
> many
> > > > > segments
> > > > > > > > (or
> > > > > > > > >> how many Bytes) needed to recover.
> > > > > > > > >> That means, if we have 10 partition logs in one broker,
> and
> > we
> > > > > have 2
> > > > > > > > log
> > > > > > > > >> recovery threads (num.recovery.threads.per.data.dir=2),
> > before
> > > > the
> > > > > > > > >> threads load the segments in each log, we only know how
> many
> > > > logs
> > > > > > > > >> (partitions) we have in the broker (i.e.
> > > RemainingLogsToRecover
> > > > > > > metric).
> > > > > > > > >> We cannot know how many segments/Bytes needed to recover
> > until
> > > > > each
> > > > > > > > thread
> > > > > > > > >> starts to load the segments under one log (partition).
> > > > > > > > >>
> > > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > > >> Currently, there are still 5 logs (partitions) needed to
> > > recover
> > > > > under
> > > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs,
> where
> > > one
> > > > > > > thread
> > > > > > > > has
> > > > > > > > >> 10000 segments needed to recover, and the other one has 3
> > > > segments
> > > > > > > > needed
> > > > > > > > >> to recover.
> > > > > > > > >>
> > > > > > > > >>   - kafka.log
> > > > > > > > >>      - LogManager
> > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > >>            - /tmp/log1 => 5            ← there are 5 logs
> > > under
> > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > >>            - /tmp/log1                     ← 2 threads are
> > > doing
> > > > > log
> > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > >>            - 0 => 10000         ← there are 10000 segments
> > > > needed
> > > > > to
> > > > > > > be
> > > > > > > > >>               recovered for thread 0
> > > > > > > > >>               - 1 => 3
> > > > > > > > >>               - /tmp/log2
> > > > > > > > >>               - 0 => 0
> > > > > > > > >>               - 1 => 0
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > > >> It said, now, there are only 4 logs needed to recover in
> > > > > /tmp/log1,
> > > > > > > and
> > > > > > > > >> the thread 0 has 9000 segments left, and thread 1 has 5
> > > segments
> > > > > left
> > > > > > > > >> (which should imply the thread already completed 2 logs
> > > recovery
> > > > > in
> > > > > > > the
> > > > > > > > >> period)
> > > > > > > > >>
> > > > > > > > >>   - kafka.log
> > > > > > > > >>      - LogManager
> > > > > > > > >>         - RemainingLogsToRecover
> > > > > > > > >>            - /tmp/log1 => 3            ← there are 3 logs
> > > under
> > > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > > >>            - /tmp/log2 => 0
> > > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > > >>            - /tmp/log1                     ← 2 threads are
> > > doing
> > > > > log
> > > > > > > > >>            recovery for /tmp/log1
> > > > > > > > >>            - 0 => 9000         ← there are 9000 segments
> > > needed
> > > > > to be
> > > > > > > > >>               recovered for thread 0
> > > > > > > > >>               - 1 => 5
> > > > > > > > >>               - /tmp/log2
> > > > > > > > >>               - 0 => 0
> > > > > > > > >>               - 1 => 0
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> > difficult
> > > to
> > > > > > > achieve
> > > > > > > > >> as you expected. I think the current proposal with
> > > > > > > > `RemainingLogsToRecover`
> > > > > > > > >> and `RemainingSegmentsToRecover` should already provide
> > enough
> > > > > info
> > > > > > > for
> > > > > > > > >> the log recovery progress.
> > > > > > > > >>
> > > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Thank you.
> > > > > > > > >> Luke
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > > wushujames@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> Hi Luke,
> > > > > > > > >>>
> > > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > > >>>
> > > > > > > > >>> Another thought: different topics can have different
> > segment
> > > > > sizes. I
> > > > > > > > >>> don't know how common it is, but it is possible. Some
> > topics
> > > > > might
> > > > > > > want
> > > > > > > > >>> small segment sizes to more granular expiration of data.
> > > > > > > > >>>
> > > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > > RemainingSegmentsToRecovery
> > > > > > > > >>> is that the rate that they will decrement depends on the
> > > > > > > configuration
> > > > > > > > and
> > > > > > > > >>> patterns of the topics and partitions and segment sizes.
> If
> > > > > someone
> > > > > > > is
> > > > > > > > >>> monitoring those metrics, they might see times where the
> > > metric
> > > > > > > > decrements
> > > > > > > > >>> slowly, followed by a burst where it decrements quickly.
> > > > > > > > >>>
> > > > > > > > >>> What about RemainingBytesToRecovery? This would not
> depend
> > on
> > > > the
> > > > > > > > >>> configuration of the topic or of the data. It would
> > actually
> > > > be a
> > > > > > > > pretty
> > > > > > > > >>> good metric, because I think that this metric would
> change
> > > at a
> > > > > > > > constant
> > > > > > > > >>> rate (based on the disk I/O speed that the broker
> allocates
> > > to
> > > > > > > > recovery).
> > > > > > > > >>> Because it changes at a constant rate, you would be able
> to
> > > use
> > > > > the
> > > > > > > > >>> rate-of-change to predict when it hits zero, which will
> let
> > > you
> > > > > know
> > > > > > > > when
> > > > > > > > >>> the broker is going to start up. Like, I would imagine if
> > we
> > > > > graphed
> > > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly straight
> > line
> > > > > that is
> > > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > > >>>
> > > > > > > > >>> What do you think about adding RemainingBytesToRecovery?
> > > > > > > > >>>
> > > > > > > > >>> Or, what would you think about making the primary metric
> be
> > > > > > > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > > > > > > >>>
> > > > > > > > >>> I don't know if I personally would rather have all 3
> > metrics,
> > > > or
> > > > > > > would
> > > > > > > > >>> just use RemainingBytesToRecovery. I'd too would like
> more
> > > > > community
> > > > > > > > input
> > > > > > > > >>> on which of those metrics would be useful to people.
> > > > > > > > >>>
> > > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > > >>> num.recovery.threads.per.data.dir=2, that there might be
> a
> > > > > separate
> > > > > > > > >>> RemainingSegmentsToRecovery counter for each thread. Is
> > that
> > > > > actually
> > > > > > > > how
> > > > > > > > >>> the data is structured within the Kafka recovery threads?
> > > Does
> > > > > each
> > > > > > > > thread
> > > > > > > > >>> get a fixed set of partitions, or is there just one big
> > pool
> > > of
> > > > > > > > partitions
> > > > > > > > >>> that the threads all work on?
> > > > > > > > >>>
> > > > > > > > >>> As a more concrete example:
> > > > > > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > > >>> Does each thread get 5 partitions, which means one thread
> > > will
> > > > > finish
> > > > > > > > >>> much sooner than the other?
> > > > > > > > >>> OR
> > > > > > > > >>> Do both threads just work on the set of 10 partitions,
> > which
> > > > > means
> > > > > > > > likely
> > > > > > > > >>> 1 thread will be busy with the big partition, while the
> > other
> > > > one
> > > > > > > ends
> > > > > > > > up
> > > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > > >>>
> > > > > > > > >>> If each thread gets assigned 5 partitions, then it would
> > make
> > > > > sense
> > > > > > > > that
> > > > > > > > >>> each thread has its own counter.
> > > > > > > > >>> If the threads works on a single pool of 10 partitions,
> > then
> > > it
> > > > > would
> > > > > > > > >>> probably mean that the counter is on the pool of
> partitions
> > > > > itself,
> > > > > > > > and not
> > > > > > > > >>> on each thread.
> > > > > > > > >>>
> > > > > > > > >>> -James
> > > > > > > > >>>
> > > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <
> showuon@gmail.com>
> > > > > wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> Hi devs,
> > > > > > > > >>>>
> > > > > > > > >>>> If there are no other comments, I'll start a vote
> > tomorrow.
> > > > > > > > >>>>
> > > > > > > > >>>> Thank you.
> > > > > > > > >>>> Luke
> > > > > > > > >>>>
> > > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> > showuon@gmail.com
> > > >
> > > > > wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> Hi James,
> > > > > > > > >>>>>
> > > > > > > > >>>>> Sorry for the late reply.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Yes, this is a good point, to know how many segments to
> > be
> > > > > > > recovered
> > > > > > > > if
> > > > > > > > >>>>> there are some large partitions.
> > > > > > > > >>>>> I've updated the KIP, to add a
> > > `*RemainingSegmentsToRecover*`
> > > > > > > metric
> > > > > > > > >>> for
> > > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > > >>>>> The example in the Proposed section here
> > > > > > > > >>>>> <
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > > >>>>
> > > > > > > > >>>>> shows what it will look like.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Thank you.
> > > > > > > > >>>>> Luke
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > > wushujames@gmail.com>
> > > > > > > > >>> wrote:
> > > > > > > > >>>>>
> > > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems
> > to
> > > be
> > > > > the
> > > > > > > > >>> number
> > > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> We have some partitions which are much much larger
> than
> > > > > others.
> > > > > > > > Those
> > > > > > > > >>>>>> large partitions have many many more segments than
> > others.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Is there a way the metric can reflect partition size?
> > > Could
> > > > > it be
> > > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > > RemainingBytesToRecover?
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> -James
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> Sent from my iPhone
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > > showuon@gmail.com>
> > > > > > > wrote:
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Hi all,
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric for log
> > > > recovery
> > > > > > > > >>> progress.
> > > > > > > > >>>>>>> This metric would let the admins have a way to
> monitor
> > > the
> > > > > log
> > > > > > > > >>> recovery
> > > > > > > > >>>>>>> progress.
> > > > > > > > >>>>>>> Details can be found here:
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Thank you.
> > > > > > > > >>>>>>> Luke
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards,
> > > > > Raman Verma
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi Jun,

Thanks for the comment.

Yes, I've tried to work on this way to track the number of remaining
segments, but it will change the design in UnifiedLog, so I only track the
logs number.
Currently, we will load all segments and recover those segments if needed
"during creating UnifiedLog instance". And also get the log offsets here
<https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842>
.
That is, if we want to get all segments to be recovered before running log
recovery, we need to break the logic in UnifiedLog, to create a partial
UnifiedLog instance, and add more info to it later, which I think is just
making the codes more complicated.


Thank you.
Luke



On Thu, May 26, 2022 at 2:57 AM Jun Rao <ju...@confluent.io.invalid> wrote:

> Hi, Luke,
>
> Thanks for the KIP. Just one comment.
>
> 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could we
> instead track the number of remaining segments? This monitors the progress
> at a finer granularity and is also consistent with the thread level metric.
>
> Thanks,
>
> Jun
>
> On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Thanks Luke! LGTM.
> >
> > On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com> wrote:
> >
> > > Hi Tom and Raman,
> > >
> > > Thanks for your comments.
> > >
> > > > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> > > 2. Similarly the link to this discussion thread needs updating.
> > > > Please update the links to JIRA and the discussion thread.
> > >
> > > Yes, thanks for the reminder. I've updated the KIP.
> > >
> > > > 3. I wonder whether we need to keep these metrics (with value 0) once
> > the
> > > broker enters the running state. Do you see it as valuable? A benefit
> of
> > > removing the metrics would be a reduction on storage required for
> metric
> > > stores which are recording these metrics.
> > >
> > > Yes, removing the metrics after log recovery completed is a good idea.
> > > Updated the KIP.
> > >
> > > > 4. I think the KIP's public interfaces section could be a bit
> clearer.
> > > Previous KIPs which added metrics usually used a table, with the MBean
> > > name, metric type and description. SeeKIP-551 for example (or KIP-748,
> > > KIP-608). Similarly you could use a table in the proposed changes
> section
> > > rather than describing the tree you'd see in an MBean console.
> > >
> > > Good point! Updated the KIP to use a table to list the MBean name,
> metric
> > > type and descriptions.
> > >
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> <rverma@confluent.io.invalid
> > >
> > > wrote:
> > >
> > > > Hi Luke,
> > > >
> > > > The change is useful and simple. Thanks.
> > > > Please update the links to JIRA and the discussion thread.
> > > >
> > > > Best Regards,
> > > > Raman Verma
> > > >
> > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tb...@redhat.com>
> > wrote:
> > > > >
> > > > > Hi Luke,
> > > > >
> > > > > Thanks for the KIP. I think the idea makes sense and would provide
> > > useful
> > > > > observability of log recovery. I have a few comments.
> > > > >
> > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> updating).
> > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > 3. I wonder whether we need to keep these metrics (with value 0)
> once
> > > the
> > > > > broker enters the running state. Do you see it as valuable? A
> benefit
> > > of
> > > > > removing the metrics would be a reduction on storage required for
> > > metric
> > > > > stores which are recording these metrics.
> > > > > 4. I think the KIP's public interfaces section could be a bit
> > clearer.
> > > > > Previous KIPs which added metrics usually used a table, with the
> > MBean
> > > > > name, metric type and description. SeeKIP-551 for example (or
> > KIP-748,
> > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > section
> > > > > rather than describing the tree you'd see in an MBean console.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com> wrote:
> > > > >
> > > > > > > And if people start using RemainingLogs and RemainingSegments
> and
> > > > then
> > > > > > REALLY FEEL like they need RemainingBytes, then we can always add
> > it
> > > > in the
> > > > > > future.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks James!
> > > > > > Luke
> > > > > >
> > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <
> wushujames@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Luke,
> > > > > > >
> > > > > > > Thanks for the detailed explanation. I agree that the current
> > > > proposal of
> > > > > > > RemainingLogs and RemainingSegments will greatly improve the
> > > > situation,
> > > > > > and
> > > > > > > that we can go ahead with the KIP as is.
> > > > > > >
> > > > > > > If RemainingBytes were straight-forward to implement, then I’d
> > like
> > > > to
> > > > > > > have it. But we can live without it for now. And if people
> start
> > > > using
> > > > > > > RemainingLogs and RemainingSegments and then REALLY FEEL like
> > they
> > > > need
> > > > > > > RemainingBytes, then we can always add it in the future.
> > > > > > >
> > > > > > > Thanks Luke, for the detailed explanation, and for responding
> to
> > my
> > > > > > > feedback!
> > > > > > >
> > > > > > > -James
> > > > > > >
> > > > > > > Sent from my iPhone
> > > > > > >
> > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > Hi James and all,
> > > > > > > >
> > > > > > > > I checked again and I can see when creating UnifiedLog, we
> > > > expected the
> > > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > > So, I don't think we should break the current design to
> expose
> > > the
> > > > > > > > `RemainingBytesToRecovery`
> > > > > > > > metric.
> > > > > > > >
> > > > > > > > If there is no other comments, I'll start a vote within this
> > > week.
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <showuon@gmail.com
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> Hi James,
> > > > > > > >>
> > > > > > > >> Thanks for your input.
> > > > > > > >>
> > > > > > > >> For the `RemainingBytesToRecovery` metric proposal, I think
> > > > there's
> > > > > > one
> > > > > > > >> thing I didn't make it clear.
> > > > > > > >> Currently, when log manager start up, we'll try to load all
> > logs
> > > > > > > >> (segments), and during the log loading, we'll try to recover
> > > logs
> > > > if
> > > > > > > >> necessary.
> > > > > > > >> And the logs loading is using "thread pool" as you thought.
> > > > > > > >>
> > > > > > > >> So, here's the problem:
> > > > > > > >> All segments in each log folder (partition) will be loaded
> in
> > > > each log
> > > > > > > >> recovery thread, and until it's loaded, we can know how many
> > > > segments
> > > > > > > (or
> > > > > > > >> how many Bytes) needed to recover.
> > > > > > > >> That means, if we have 10 partition logs in one broker, and
> we
> > > > have 2
> > > > > > > log
> > > > > > > >> recovery threads (num.recovery.threads.per.data.dir=2),
> before
> > > the
> > > > > > > >> threads load the segments in each log, we only know how many
> > > logs
> > > > > > > >> (partitions) we have in the broker (i.e.
> > RemainingLogsToRecover
> > > > > > metric).
> > > > > > > >> We cannot know how many segments/Bytes needed to recover
> until
> > > > each
> > > > > > > thread
> > > > > > > >> starts to load the segments under one log (partition).
> > > > > > > >>
> > > > > > > >> So, the example in the KIP, it shows:
> > > > > > > >> Currently, there are still 5 logs (partitions) needed to
> > recover
> > > > under
> > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where
> > one
> > > > > > thread
> > > > > > > has
> > > > > > > >> 10000 segments needed to recover, and the other one has 3
> > > segments
> > > > > > > needed
> > > > > > > >> to recover.
> > > > > > > >>
> > > > > > > >>   - kafka.log
> > > > > > > >>      - LogManager
> > > > > > > >>         - RemainingLogsToRecover
> > > > > > > >>            - /tmp/log1 => 5            ← there are 5 logs
> > under
> > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > >>            - /tmp/log2 => 0
> > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > >>            - /tmp/log1                     ← 2 threads are
> > doing
> > > > log
> > > > > > > >>            recovery for /tmp/log1
> > > > > > > >>            - 0 => 10000         ← there are 10000 segments
> > > needed
> > > > to
> > > > > > be
> > > > > > > >>               recovered for thread 0
> > > > > > > >>               - 1 => 3
> > > > > > > >>               - /tmp/log2
> > > > > > > >>               - 0 => 0
> > > > > > > >>               - 1 => 0
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> So, after a while, the metrics might look like this:
> > > > > > > >> It said, now, there are only 4 logs needed to recover in
> > > > /tmp/log1,
> > > > > > and
> > > > > > > >> the thread 0 has 9000 segments left, and thread 1 has 5
> > segments
> > > > left
> > > > > > > >> (which should imply the thread already completed 2 logs
> > recovery
> > > > in
> > > > > > the
> > > > > > > >> period)
> > > > > > > >>
> > > > > > > >>   - kafka.log
> > > > > > > >>      - LogManager
> > > > > > > >>         - RemainingLogsToRecover
> > > > > > > >>            - /tmp/log1 => 3            ← there are 3 logs
> > under
> > > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > > >>            - /tmp/log2 => 0
> > > > > > > >>         - RemainingSegmentsToRecover
> > > > > > > >>            - /tmp/log1                     ← 2 threads are
> > doing
> > > > log
> > > > > > > >>            recovery for /tmp/log1
> > > > > > > >>            - 0 => 9000         ← there are 9000 segments
> > needed
> > > > to be
> > > > > > > >>               recovered for thread 0
> > > > > > > >>               - 1 => 5
> > > > > > > >>               - /tmp/log2
> > > > > > > >>               - 0 => 0
> > > > > > > >>               - 1 => 0
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> That said, the `RemainingBytesToRecovery` metric is
> difficult
> > to
> > > > > > achieve
> > > > > > > >> as you expected. I think the current proposal with
> > > > > > > `RemainingLogsToRecover`
> > > > > > > >> and `RemainingSegmentsToRecover` should already provide
> enough
> > > > info
> > > > > > for
> > > > > > > >> the log recovery progress.
> > > > > > > >>
> > > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Thank you.
> > > > > > > >> Luke
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > > wushujames@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> Hi Luke,
> > > > > > > >>>
> > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > > >>>
> > > > > > > >>> Another thought: different topics can have different
> segment
> > > > sizes. I
> > > > > > > >>> don't know how common it is, but it is possible. Some
> topics
> > > > might
> > > > > > want
> > > > > > > >>> small segment sizes to more granular expiration of data.
> > > > > > > >>>
> > > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > > RemainingSegmentsToRecovery
> > > > > > > >>> is that the rate that they will decrement depends on the
> > > > > > configuration
> > > > > > > and
> > > > > > > >>> patterns of the topics and partitions and segment sizes. If
> > > > someone
> > > > > > is
> > > > > > > >>> monitoring those metrics, they might see times where the
> > metric
> > > > > > > decrements
> > > > > > > >>> slowly, followed by a burst where it decrements quickly.
> > > > > > > >>>
> > > > > > > >>> What about RemainingBytesToRecovery? This would not depend
> on
> > > the
> > > > > > > >>> configuration of the topic or of the data. It would
> actually
> > > be a
> > > > > > > pretty
> > > > > > > >>> good metric, because I think that this metric would change
> > at a
> > > > > > > constant
> > > > > > > >>> rate (based on the disk I/O speed that the broker allocates
> > to
> > > > > > > recovery).
> > > > > > > >>> Because it changes at a constant rate, you would be able to
> > use
> > > > the
> > > > > > > >>> rate-of-change to predict when it hits zero, which will let
> > you
> > > > know
> > > > > > > when
> > > > > > > >>> the broker is going to start up. Like, I would imagine if
> we
> > > > graphed
> > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly straight
> line
> > > > that is
> > > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > > >>>
> > > > > > > >>> What do you think about adding RemainingBytesToRecovery?
> > > > > > > >>>
> > > > > > > >>> Or, what would you think about making the primary metric be
> > > > > > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > > > > > >>>
> > > > > > > >>> I don't know if I personally would rather have all 3
> metrics,
> > > or
> > > > > > would
> > > > > > > >>> just use RemainingBytesToRecovery. I'd too would like more
> > > > community
> > > > > > > input
> > > > > > > >>> on which of those metrics would be useful to people.
> > > > > > > >>>
> > > > > > > >>> About the JMX metrics, you said that if
> > > > > > > >>> num.recovery.threads.per.data.dir=2, that there might be a
> > > > separate
> > > > > > > >>> RemainingSegmentsToRecovery counter for each thread. Is
> that
> > > > actually
> > > > > > > how
> > > > > > > >>> the data is structured within the Kafka recovery threads?
> > Does
> > > > each
> > > > > > > thread
> > > > > > > >>> get a fixed set of partitions, or is there just one big
> pool
> > of
> > > > > > > partitions
> > > > > > > >>> that the threads all work on?
> > > > > > > >>>
> > > > > > > >>> As a more concrete example:
> > > > > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > > >>> Does each thread get 5 partitions, which means one thread
> > will
> > > > finish
> > > > > > > >>> much sooner than the other?
> > > > > > > >>> OR
> > > > > > > >>> Do both threads just work on the set of 10 partitions,
> which
> > > > means
> > > > > > > likely
> > > > > > > >>> 1 thread will be busy with the big partition, while the
> other
> > > one
> > > > > > ends
> > > > > > > up
> > > > > > > >>> plowing through the 9 small partitions?
> > > > > > > >>>
> > > > > > > >>> If each thread gets assigned 5 partitions, then it would
> make
> > > > sense
> > > > > > > that
> > > > > > > >>> each thread has its own counter.
> > > > > > > >>> If the threads works on a single pool of 10 partitions,
> then
> > it
> > > > would
> > > > > > > >>> probably mean that the counter is on the pool of partitions
> > > > itself,
> > > > > > > and not
> > > > > > > >>> on each thread.
> > > > > > > >>>
> > > > > > > >>> -James
> > > > > > > >>>
> > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com>
> > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>> Hi devs,
> > > > > > > >>>>
> > > > > > > >>>> If there are no other comments, I'll start a vote
> tomorrow.
> > > > > > > >>>>
> > > > > > > >>>> Thank you.
> > > > > > > >>>> Luke
> > > > > > > >>>>
> > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <
> showuon@gmail.com
> > >
> > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Hi James,
> > > > > > > >>>>>
> > > > > > > >>>>> Sorry for the late reply.
> > > > > > > >>>>>
> > > > > > > >>>>> Yes, this is a good point, to know how many segments to
> be
> > > > > > recovered
> > > > > > > if
> > > > > > > >>>>> there are some large partitions.
> > > > > > > >>>>> I've updated the KIP, to add a
> > `*RemainingSegmentsToRecover*`
> > > > > > metric
> > > > > > > >>> for
> > > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > > >>>>> The example in the Proposed section here
> > > > > > > >>>>> <
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > > >>>>
> > > > > > > >>>>> shows what it will look like.
> > > > > > > >>>>>
> > > > > > > >>>>> Thanks for the suggestion.
> > > > > > > >>>>>
> > > > > > > >>>>> Thank you.
> > > > > > > >>>>> Luke
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > > wushujames@gmail.com>
> > > > > > > >>> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems
> to
> > be
> > > > the
> > > > > > > >>> number
> > > > > > > >>>>>> of partitions in each log.dir.
> > > > > > > >>>>>>
> > > > > > > >>>>>> We have some partitions which are much much larger than
> > > > others.
> > > > > > > Those
> > > > > > > >>>>>> large partitions have many many more segments than
> others.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Is there a way the metric can reflect partition size?
> > Could
> > > > it be
> > > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> > RemainingBytesToRecover?
> > > > > > > >>>>>>
> > > > > > > >>>>>> -James
> > > > > > > >>>>>>
> > > > > > > >>>>>> Sent from my iPhone
> > > > > > > >>>>>>
> > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> > showuon@gmail.com>
> > > > > > wrote:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Hi all,
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric for log
> > > recovery
> > > > > > > >>> progress.
> > > > > > > >>>>>>> This metric would let the admins have a way to monitor
> > the
> > > > log
> > > > > > > >>> recovery
> > > > > > > >>>>>>> progress.
> > > > > > > >>>>>>> Details can be found here:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Thank you.
> > > > > > > >>>>>>> Luke
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards,
> > > > Raman Verma
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Jun Rao <ju...@confluent.io.INVALID>.

Hi, Luke,

Thanks for the KIP. Just one comment.

10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could we
instead track the number of remaining segments? This monitors the progress
at a finer granularity and is also consistent with the thread level metric.

Thanks,

Jun

On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tb...@redhat.com> wrote:

> Thanks Luke! LGTM.
>
> On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com> wrote:
>
> > Hi Tom and Raman,
> >
> > Thanks for your comments.
> >
> > > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> > 2. Similarly the link to this discussion thread needs updating.
> > > Please update the links to JIRA and the discussion thread.
> >
> > Yes, thanks for the reminder. I've updated the KIP.
> >
> > > 3. I wonder whether we need to keep these metrics (with value 0) once
> the
> > broker enters the running state. Do you see it as valuable? A benefit of
> > removing the metrics would be a reduction on storage required for metric
> > stores which are recording these metrics.
> >
> > Yes, removing the metrics after log recovery completed is a good idea.
> > Updated the KIP.
> >
> > > 4. I think the KIP's public interfaces section could be a bit clearer.
> > Previous KIPs which added metrics usually used a table, with the MBean
> > name, metric type and description. SeeKIP-551 for example (or KIP-748,
> > KIP-608). Similarly you could use a table in the proposed changes section
> > rather than describing the tree you'd see in an MBean console.
> >
> > Good point! Updated the KIP to use a table to list the MBean name, metric
> > type and descriptions.
> >
> >
> > Thank you.
> > Luke
> >
> > On Fri, May 20, 2022 at 9:13 AM Raman Verma <rverma@confluent.io.invalid
> >
> > wrote:
> >
> > > Hi Luke,
> > >
> > > The change is useful and simple. Thanks.
> > > Please update the links to JIRA and the discussion thread.
> > >
> > > Best Regards,
> > > Raman Verma
> > >
> > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tb...@redhat.com>
> wrote:
> > > >
> > > > Hi Luke,
> > > >
> > > > Thanks for the KIP. I think the idea makes sense and would provide
> > useful
> > > > observability of log recovery. I have a few comments.
> > > >
> > > > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> > > > 2. Similarly the link to this discussion thread needs updating.
> > > > 3. I wonder whether we need to keep these metrics (with value 0) once
> > the
> > > > broker enters the running state. Do you see it as valuable? A benefit
> > of
> > > > removing the metrics would be a reduction on storage required for
> > metric
> > > > stores which are recording these metrics.
> > > > 4. I think the KIP's public interfaces section could be a bit
> clearer.
> > > > Previous KIPs which added metrics usually used a table, with the
> MBean
> > > > name, metric type and description. SeeKIP-551 for example (or
> KIP-748,
> > > > KIP-608). Similarly you could use a table in the proposed changes
> > section
> > > > rather than describing the tree you'd see in an MBean console.
> > > >
> > > > Kind regards,
> > > >
> > > > Tom
> > > >
> > > > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > > > And if people start using RemainingLogs and RemainingSegments and
> > > then
> > > > > REALLY FEEL like they need RemainingBytes, then we can always add
> it
> > > in the
> > > > > future.
> > > > >
> > > > > +1
> > > > >
> > > > > Thanks James!
> > > > > Luke
> > > > >
> > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <wu...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Luke,
> > > > > >
> > > > > > Thanks for the detailed explanation. I agree that the current
> > > proposal of
> > > > > > RemainingLogs and RemainingSegments will greatly improve the
> > > situation,
> > > > > and
> > > > > > that we can go ahead with the KIP as is.
> > > > > >
> > > > > > If RemainingBytes were straight-forward to implement, then I’d
> like
> > > to
> > > > > > have it. But we can live without it for now. And if people start
> > > using
> > > > > > RemainingLogs and RemainingSegments and then REALLY FEEL like
> they
> > > need
> > > > > > RemainingBytes, then we can always add it in the future.
> > > > > >
> > > > > > Thanks Luke, for the detailed explanation, and for responding to
> my
> > > > > > feedback!
> > > > > >
> > > > > > -James
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > Hi James and all,
> > > > > > >
> > > > > > > I checked again and I can see when creating UnifiedLog, we
> > > expected the
> > > > > > > logs/indexes/snapshots are in good state.
> > > > > > > So, I don't think we should break the current design to expose
> > the
> > > > > > > `RemainingBytesToRecovery`
> > > > > > > metric.
> > > > > > >
> > > > > > > If there is no other comments, I'll start a vote within this
> > week.
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com>
> > > wrote:
> > > > > > >>
> > > > > > >> Hi James,
> > > > > > >>
> > > > > > >> Thanks for your input.
> > > > > > >>
> > > > > > >> For the `RemainingBytesToRecovery` metric proposal, I think
> > > there's
> > > > > one
> > > > > > >> thing I didn't make it clear.
> > > > > > >> Currently, when log manager start up, we'll try to load all
> logs
> > > > > > >> (segments), and during the log loading, we'll try to recover
> > logs
> > > if
> > > > > > >> necessary.
> > > > > > >> And the logs loading is using "thread pool" as you thought.
> > > > > > >>
> > > > > > >> So, here's the problem:
> > > > > > >> All segments in each log folder (partition) will be loaded in
> > > each log
> > > > > > >> recovery thread, and until it's loaded, we can know how many
> > > segments
> > > > > > (or
> > > > > > >> how many Bytes) needed to recover.
> > > > > > >> That means, if we have 10 partition logs in one broker, and we
> > > have 2
> > > > > > log
> > > > > > >> recovery threads (num.recovery.threads.per.data.dir=2), before
> > the
> > > > > > >> threads load the segments in each log, we only know how many
> > logs
> > > > > > >> (partitions) we have in the broker (i.e.
> RemainingLogsToRecover
> > > > > metric).
> > > > > > >> We cannot know how many segments/Bytes needed to recover until
> > > each
> > > > > > thread
> > > > > > >> starts to load the segments under one log (partition).
> > > > > > >>
> > > > > > >> So, the example in the KIP, it shows:
> > > > > > >> Currently, there are still 5 logs (partitions) needed to
> recover
> > > under
> > > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where
> one
> > > > > thread
> > > > > > has
> > > > > > >> 10000 segments needed to recover, and the other one has 3
> > segments
> > > > > > needed
> > > > > > >> to recover.
> > > > > > >>
> > > > > > >>   - kafka.log
> > > > > > >>      - LogManager
> > > > > > >>         - RemainingLogsToRecover
> > > > > > >>            - /tmp/log1 => 5            ← there are 5 logs
> under
> > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > >>            - /tmp/log2 => 0
> > > > > > >>         - RemainingSegmentsToRecover
> > > > > > >>            - /tmp/log1                     ← 2 threads are
> doing
> > > log
> > > > > > >>            recovery for /tmp/log1
> > > > > > >>            - 0 => 10000         ← there are 10000 segments
> > needed
> > > to
> > > > > be
> > > > > > >>               recovered for thread 0
> > > > > > >>               - 1 => 3
> > > > > > >>               - /tmp/log2
> > > > > > >>               - 0 => 0
> > > > > > >>               - 1 => 0
> > > > > > >>
> > > > > > >>
> > > > > > >> So, after a while, the metrics might look like this:
> > > > > > >> It said, now, there are only 4 logs needed to recover in
> > > /tmp/log1,
> > > > > and
> > > > > > >> the thread 0 has 9000 segments left, and thread 1 has 5
> segments
> > > left
> > > > > > >> (which should imply the thread already completed 2 logs
> recovery
> > > in
> > > > > the
> > > > > > >> period)
> > > > > > >>
> > > > > > >>   - kafka.log
> > > > > > >>      - LogManager
> > > > > > >>         - RemainingLogsToRecover
> > > > > > >>            - /tmp/log1 => 3            ← there are 3 logs
> under
> > > > > > >>            /tmp/log1 needed to be recovered
> > > > > > >>            - /tmp/log2 => 0
> > > > > > >>         - RemainingSegmentsToRecover
> > > > > > >>            - /tmp/log1                     ← 2 threads are
> doing
> > > log
> > > > > > >>            recovery for /tmp/log1
> > > > > > >>            - 0 => 9000         ← there are 9000 segments
> needed
> > > to be
> > > > > > >>               recovered for thread 0
> > > > > > >>               - 1 => 5
> > > > > > >>               - /tmp/log2
> > > > > > >>               - 0 => 0
> > > > > > >>               - 1 => 0
> > > > > > >>
> > > > > > >>
> > > > > > >> That said, the `RemainingBytesToRecovery` metric is difficult
> to
> > > > > achieve
> > > > > > >> as you expected. I think the current proposal with
> > > > > > `RemainingLogsToRecover`
> > > > > > >> and `RemainingSegmentsToRecover` should already provide enough
> > > info
> > > > > for
> > > > > > >> the log recovery progress.
> > > > > > >>
> > > > > > >> I've also updated the KIP example to make it clear.
> > > > > > >>
> > > > > > >>
> > > > > > >> Thank you.
> > > > > > >> Luke
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> > wushujames@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>> Hi Luke,
> > > > > > >>>
> > > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > > >>>
> > > > > > >>> Another thought: different topics can have different segment
> > > sizes. I
> > > > > > >>> don't know how common it is, but it is possible. Some topics
> > > might
> > > > > want
> > > > > > >>> small segment sizes to more granular expiration of data.
> > > > > > >>>
> > > > > > >>> The downside of RemainingLogsToRecovery and
> > > > > RemainingSegmentsToRecovery
> > > > > > >>> is that the rate that they will decrement depends on the
> > > > > configuration
> > > > > > and
> > > > > > >>> patterns of the topics and partitions and segment sizes. If
> > > someone
> > > > > is
> > > > > > >>> monitoring those metrics, they might see times where the
> metric
> > > > > > decrements
> > > > > > >>> slowly, followed by a burst where it decrements quickly.
> > > > > > >>>
> > > > > > >>> What about RemainingBytesToRecovery? This would not depend on
> > the
> > > > > > >>> configuration of the topic or of the data. It would actually
> > be a
> > > > > > pretty
> > > > > > >>> good metric, because I think that this metric would change
> at a
> > > > > > constant
> > > > > > >>> rate (based on the disk I/O speed that the broker allocates
> to
> > > > > > recovery).
> > > > > > >>> Because it changes at a constant rate, you would be able to
> use
> > > the
> > > > > > >>> rate-of-change to predict when it hits zero, which will let
> you
> > > know
> > > > > > when
> > > > > > >>> the broker is going to start up. Like, I would imagine if we
> > > graphed
> > > > > > >>> RemainingBytesToRecovery that we'd see a fairly straight line
> > > that is
> > > > > > >>> decrementing at a steady rate towards zero.
> > > > > > >>>
> > > > > > >>> What do you think about adding RemainingBytesToRecovery?
> > > > > > >>>
> > > > > > >>> Or, what would you think about making the primary metric be
> > > > > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > > > > >>>
> > > > > > >>> I don't know if I personally would rather have all 3 metrics,
> > or
> > > > > would
> > > > > > >>> just use RemainingBytesToRecovery. I'd too would like more
> > > community
> > > > > > input
> > > > > > >>> on which of those metrics would be useful to people.
> > > > > > >>>
> > > > > > >>> About the JMX metrics, you said that if
> > > > > > >>> num.recovery.threads.per.data.dir=2, that there might be a
> > > separate
> > > > > > >>> RemainingSegmentsToRecovery counter for each thread. Is that
> > > actually
> > > > > > how
> > > > > > >>> the data is structured within the Kafka recovery threads?
> Does
> > > each
> > > > > > thread
> > > > > > >>> get a fixed set of partitions, or is there just one big pool
> of
> > > > > > partitions
> > > > > > >>> that the threads all work on?
> > > > > > >>>
> > > > > > >>> As a more concrete example:
> > > > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > > >>> Does each thread get 5 partitions, which means one thread
> will
> > > finish
> > > > > > >>> much sooner than the other?
> > > > > > >>> OR
> > > > > > >>> Do both threads just work on the set of 10 partitions, which
> > > means
> > > > > > likely
> > > > > > >>> 1 thread will be busy with the big partition, while the other
> > one
> > > > > ends
> > > > > > up
> > > > > > >>> plowing through the 9 small partitions?
> > > > > > >>>
> > > > > > >>> If each thread gets assigned 5 partitions, then it would make
> > > sense
> > > > > > that
> > > > > > >>> each thread has its own counter.
> > > > > > >>> If the threads works on a single pool of 10 partitions, then
> it
> > > would
> > > > > > >>> probably mean that the counter is on the pool of partitions
> > > itself,
> > > > > > and not
> > > > > > >>> on each thread.
> > > > > > >>>
> > > > > > >>> -James
> > > > > > >>>
> > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com>
> > > wrote:
> > > > > > >>>>
> > > > > > >>>> Hi devs,
> > > > > > >>>>
> > > > > > >>>> If there are no other comments, I'll start a vote tomorrow.
> > > > > > >>>>
> > > > > > >>>> Thank you.
> > > > > > >>>> Luke
> > > > > > >>>>
> > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <showuon@gmail.com
> >
> > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi James,
> > > > > > >>>>>
> > > > > > >>>>> Sorry for the late reply.
> > > > > > >>>>>
> > > > > > >>>>> Yes, this is a good point, to know how many segments to be
> > > > > recovered
> > > > > > if
> > > > > > >>>>> there are some large partitions.
> > > > > > >>>>> I've updated the KIP, to add a
> `*RemainingSegmentsToRecover*`
> > > > > metric
> > > > > > >>> for
> > > > > > >>>>> each log recovery thread, to show the value.
> > > > > > >>>>> The example in the Proposed section here
> > > > > > >>>>> <
> > > > > > >>>
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > > >>>>
> > > > > > >>>>> shows what it will look like.
> > > > > > >>>>>
> > > > > > >>>>> Thanks for the suggestion.
> > > > > > >>>>>
> > > > > > >>>>> Thank you.
> > > > > > >>>>> Luke
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > > wushujames@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems to
> be
> > > the
> > > > > > >>> number
> > > > > > >>>>>> of partitions in each log.dir.
> > > > > > >>>>>>
> > > > > > >>>>>> We have some partitions which are much much larger than
> > > others.
> > > > > > Those
> > > > > > >>>>>> large partitions have many many more segments than others.
> > > > > > >>>>>>
> > > > > > >>>>>> Is there a way the metric can reflect partition size?
> Could
> > > it be
> > > > > > >>>>>> RemainingSegmentsToRecover? Or even
> RemainingBytesToRecover?
> > > > > > >>>>>>
> > > > > > >>>>>> -James
> > > > > > >>>>>>
> > > > > > >>>>>> Sent from my iPhone
> > > > > > >>>>>>
> > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <
> showuon@gmail.com>
> > > > > wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>> Hi all,
> > > > > > >>>>>>>
> > > > > > >>>>>>> I'd like to propose a KIP to expose a metric for log
> > recovery
> > > > > > >>> progress.
> > > > > > >>>>>>> This metric would let the admins have a way to monitor
> the
> > > log
> > > > > > >>> recovery
> > > > > > >>>>>>> progress.
> > > > > > >>>>>>> Details can be found here:
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > > >>>>>>>
> > > > > > >>>>>>> Any feedback is appreciated.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thank you.
> > > > > > >>>>>>> Luke
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>>
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Raman Verma
> > >
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Tom Bentley <tb...@redhat.com>.

Thanks Luke! LGTM.

On Sun, 22 May 2022 at 05:18, Luke Chen <sh...@gmail.com> wrote:

> Hi Tom and Raman,
>
> Thanks for your comments.
>
> > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> 2. Similarly the link to this discussion thread needs updating.
> > Please update the links to JIRA and the discussion thread.
>
> Yes, thanks for the reminder. I've updated the KIP.
>
> > 3. I wonder whether we need to keep these metrics (with value 0) once the
> broker enters the running state. Do you see it as valuable? A benefit of
> removing the metrics would be a reduction on storage required for metric
> stores which are recording these metrics.
>
> Yes, removing the metrics after log recovery completed is a good idea.
> Updated the KIP.
>
> > 4. I think the KIP's public interfaces section could be a bit clearer.
> Previous KIPs which added metrics usually used a table, with the MBean
> name, metric type and description. SeeKIP-551 for example (or KIP-748,
> KIP-608). Similarly you could use a table in the proposed changes section
> rather than describing the tree you'd see in an MBean console.
>
> Good point! Updated the KIP to use a table to list the MBean name, metric
> type and descriptions.
>
>
> Thank you.
> Luke
>
> On Fri, May 20, 2022 at 9:13 AM Raman Verma <rv...@confluent.io.invalid>
> wrote:
>
> > Hi Luke,
> >
> > The change is useful and simple. Thanks.
> > Please update the links to JIRA and the discussion thread.
> >
> > Best Regards,
> > Raman Verma
> >
> > On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tb...@redhat.com> wrote:
> > >
> > > Hi Luke,
> > >
> > > Thanks for the KIP. I think the idea makes sense and would provide
> useful
> > > observability of log recovery. I have a few comments.
> > >
> > > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> > > 2. Similarly the link to this discussion thread needs updating.
> > > 3. I wonder whether we need to keep these metrics (with value 0) once
> the
> > > broker enters the running state. Do you see it as valuable? A benefit
> of
> > > removing the metrics would be a reduction on storage required for
> metric
> > > stores which are recording these metrics.
> > > 4. I think the KIP's public interfaces section could be a bit clearer.
> > > Previous KIPs which added metrics usually used a table, with the MBean
> > > name, metric type and description. SeeKIP-551 for example (or KIP-748,
> > > KIP-608). Similarly you could use a table in the proposed changes
> section
> > > rather than describing the tree you'd see in an MBean console.
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com> wrote:
> > >
> > > > > And if people start using RemainingLogs and RemainingSegments and
> > then
> > > > REALLY FEEL like they need RemainingBytes, then we can always add it
> > in the
> > > > future.
> > > >
> > > > +1
> > > >
> > > > Thanks James!
> > > > Luke
> > > >
> > > > On Wed, May 11, 2022 at 3:57 PM James Cheng <wu...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Luke,
> > > > >
> > > > > Thanks for the detailed explanation. I agree that the current
> > proposal of
> > > > > RemainingLogs and RemainingSegments will greatly improve the
> > situation,
> > > > and
> > > > > that we can go ahead with the KIP as is.
> > > > >
> > > > > If RemainingBytes were straight-forward to implement, then I’d like
> > to
> > > > > have it. But we can live without it for now. And if people start
> > using
> > > > > RemainingLogs and RemainingSegments and then REALLY FEEL like they
> > need
> > > > > RemainingBytes, then we can always add it in the future.
> > > > >
> > > > > Thanks Luke, for the detailed explanation, and for responding to my
> > > > > feedback!
> > > > >
> > > > > -James
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com>
> wrote:
> > > > > >
> > > > > > Hi James and all,
> > > > > >
> > > > > > I checked again and I can see when creating UnifiedLog, we
> > expected the
> > > > > > logs/indexes/snapshots are in good state.
> > > > > > So, I don't think we should break the current design to expose
> the
> > > > > > `RemainingBytesToRecovery`
> > > > > > metric.
> > > > > >
> > > > > > If there is no other comments, I'll start a vote within this
> week.
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > >>
> > > > > >> Hi James,
> > > > > >>
> > > > > >> Thanks for your input.
> > > > > >>
> > > > > >> For the `RemainingBytesToRecovery` metric proposal, I think
> > there's
> > > > one
> > > > > >> thing I didn't make it clear.
> > > > > >> Currently, when log manager start up, we'll try to load all logs
> > > > > >> (segments), and during the log loading, we'll try to recover
> logs
> > if
> > > > > >> necessary.
> > > > > >> And the logs loading is using "thread pool" as you thought.
> > > > > >>
> > > > > >> So, here's the problem:
> > > > > >> All segments in each log folder (partition) will be loaded in
> > each log
> > > > > >> recovery thread, and until it's loaded, we can know how many
> > segments
> > > > > (or
> > > > > >> how many Bytes) needed to recover.
> > > > > >> That means, if we have 10 partition logs in one broker, and we
> > have 2
> > > > > log
> > > > > >> recovery threads (num.recovery.threads.per.data.dir=2), before
> the
> > > > > >> threads load the segments in each log, we only know how many
> logs
> > > > > >> (partitions) we have in the broker (i.e. RemainingLogsToRecover
> > > > metric).
> > > > > >> We cannot know how many segments/Bytes needed to recover until
> > each
> > > > > thread
> > > > > >> starts to load the segments under one log (partition).
> > > > > >>
> > > > > >> So, the example in the KIP, it shows:
> > > > > >> Currently, there are still 5 logs (partitions) needed to recover
> > under
> > > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one
> > > > thread
> > > > > has
> > > > > >> 10000 segments needed to recover, and the other one has 3
> segments
> > > > > needed
> > > > > >> to recover.
> > > > > >>
> > > > > >>   - kafka.log
> > > > > >>      - LogManager
> > > > > >>         - RemainingLogsToRecover
> > > > > >>            - /tmp/log1 => 5            ← there are 5 logs under
> > > > > >>            /tmp/log1 needed to be recovered
> > > > > >>            - /tmp/log2 => 0
> > > > > >>         - RemainingSegmentsToRecover
> > > > > >>            - /tmp/log1                     ← 2 threads are doing
> > log
> > > > > >>            recovery for /tmp/log1
> > > > > >>            - 0 => 10000         ← there are 10000 segments
> needed
> > to
> > > > be
> > > > > >>               recovered for thread 0
> > > > > >>               - 1 => 3
> > > > > >>               - /tmp/log2
> > > > > >>               - 0 => 0
> > > > > >>               - 1 => 0
> > > > > >>
> > > > > >>
> > > > > >> So, after a while, the metrics might look like this:
> > > > > >> It said, now, there are only 4 logs needed to recover in
> > /tmp/log1,
> > > > and
> > > > > >> the thread 0 has 9000 segments left, and thread 1 has 5 segments
> > left
> > > > > >> (which should imply the thread already completed 2 logs recovery
> > in
> > > > the
> > > > > >> period)
> > > > > >>
> > > > > >>   - kafka.log
> > > > > >>      - LogManager
> > > > > >>         - RemainingLogsToRecover
> > > > > >>            - /tmp/log1 => 3            ← there are 3 logs under
> > > > > >>            /tmp/log1 needed to be recovered
> > > > > >>            - /tmp/log2 => 0
> > > > > >>         - RemainingSegmentsToRecover
> > > > > >>            - /tmp/log1                     ← 2 threads are doing
> > log
> > > > > >>            recovery for /tmp/log1
> > > > > >>            - 0 => 9000         ← there are 9000 segments needed
> > to be
> > > > > >>               recovered for thread 0
> > > > > >>               - 1 => 5
> > > > > >>               - /tmp/log2
> > > > > >>               - 0 => 0
> > > > > >>               - 1 => 0
> > > > > >>
> > > > > >>
> > > > > >> That said, the `RemainingBytesToRecovery` metric is difficult to
> > > > achieve
> > > > > >> as you expected. I think the current proposal with
> > > > > `RemainingLogsToRecover`
> > > > > >> and `RemainingSegmentsToRecover` should already provide enough
> > info
> > > > for
> > > > > >> the log recovery progress.
> > > > > >>
> > > > > >> I've also updated the KIP example to make it clear.
> > > > > >>
> > > > > >>
> > > > > >> Thank you.
> > > > > >> Luke
> > > > > >>
> > > > > >>
> > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <
> wushujames@gmail.com
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>> Hi Luke,
> > > > > >>>
> > > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > > >>>
> > > > > >>> Another thought: different topics can have different segment
> > sizes. I
> > > > > >>> don't know how common it is, but it is possible. Some topics
> > might
> > > > want
> > > > > >>> small segment sizes to more granular expiration of data.
> > > > > >>>
> > > > > >>> The downside of RemainingLogsToRecovery and
> > > > RemainingSegmentsToRecovery
> > > > > >>> is that the rate that they will decrement depends on the
> > > > configuration
> > > > > and
> > > > > >>> patterns of the topics and partitions and segment sizes. If
> > someone
> > > > is
> > > > > >>> monitoring those metrics, they might see times where the metric
> > > > > decrements
> > > > > >>> slowly, followed by a burst where it decrements quickly.
> > > > > >>>
> > > > > >>> What about RemainingBytesToRecovery? This would not depend on
> the
> > > > > >>> configuration of the topic or of the data. It would actually
> be a
> > > > > pretty
> > > > > >>> good metric, because I think that this metric would change at a
> > > > > constant
> > > > > >>> rate (based on the disk I/O speed that the broker allocates to
> > > > > recovery).
> > > > > >>> Because it changes at a constant rate, you would be able to use
> > the
> > > > > >>> rate-of-change to predict when it hits zero, which will let you
> > know
> > > > > when
> > > > > >>> the broker is going to start up. Like, I would imagine if we
> > graphed
> > > > > >>> RemainingBytesToRecovery that we'd see a fairly straight line
> > that is
> > > > > >>> decrementing at a steady rate towards zero.
> > > > > >>>
> > > > > >>> What do you think about adding RemainingBytesToRecovery?
> > > > > >>>
> > > > > >>> Or, what would you think about making the primary metric be
> > > > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > > > >>>
> > > > > >>> I don't know if I personally would rather have all 3 metrics,
> or
> > > > would
> > > > > >>> just use RemainingBytesToRecovery. I'd too would like more
> > community
> > > > > input
> > > > > >>> on which of those metrics would be useful to people.
> > > > > >>>
> > > > > >>> About the JMX metrics, you said that if
> > > > > >>> num.recovery.threads.per.data.dir=2, that there might be a
> > separate
> > > > > >>> RemainingSegmentsToRecovery counter for each thread. Is that
> > actually
> > > > > how
> > > > > >>> the data is structured within the Kafka recovery threads? Does
> > each
> > > > > thread
> > > > > >>> get a fixed set of partitions, or is there just one big pool of
> > > > > partitions
> > > > > >>> that the threads all work on?
> > > > > >>>
> > > > > >>> As a more concrete example:
> > > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > > >>> num.recovery.threads.per.data.dir=2
> > > > > >>> Does each thread get 5 partitions, which means one thread will
> > finish
> > > > > >>> much sooner than the other?
> > > > > >>> OR
> > > > > >>> Do both threads just work on the set of 10 partitions, which
> > means
> > > > > likely
> > > > > >>> 1 thread will be busy with the big partition, while the other
> one
> > > > ends
> > > > > up
> > > > > >>> plowing through the 9 small partitions?
> > > > > >>>
> > > > > >>> If each thread gets assigned 5 partitions, then it would make
> > sense
> > > > > that
> > > > > >>> each thread has its own counter.
> > > > > >>> If the threads works on a single pool of 10 partitions, then it
> > would
> > > > > >>> probably mean that the counter is on the pool of partitions
> > itself,
> > > > > and not
> > > > > >>> on each thread.
> > > > > >>>
> > > > > >>> -James
> > > > > >>>
> > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > >>>>
> > > > > >>>> Hi devs,
> > > > > >>>>
> > > > > >>>> If there are no other comments, I'll start a vote tomorrow.
> > > > > >>>>
> > > > > >>>> Thank you.
> > > > > >>>> Luke
> > > > > >>>>
> > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com>
> > wrote:
> > > > > >>>>
> > > > > >>>>> Hi James,
> > > > > >>>>>
> > > > > >>>>> Sorry for the late reply.
> > > > > >>>>>
> > > > > >>>>> Yes, this is a good point, to know how many segments to be
> > > > recovered
> > > > > if
> > > > > >>>>> there are some large partitions.
> > > > > >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*`
> > > > metric
> > > > > >>> for
> > > > > >>>>> each log recovery thread, to show the value.
> > > > > >>>>> The example in the Proposed section here
> > > > > >>>>> <
> > > > > >>>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > > >>>>
> > > > > >>>>> shows what it will look like.
> > > > > >>>>>
> > > > > >>>>> Thanks for the suggestion.
> > > > > >>>>>
> > > > > >>>>> Thank you.
> > > > > >>>>> Luke
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> > wushujames@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>
> > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be
> > the
> > > > > >>> number
> > > > > >>>>>> of partitions in each log.dir.
> > > > > >>>>>>
> > > > > >>>>>> We have some partitions which are much much larger than
> > others.
> > > > > Those
> > > > > >>>>>> large partitions have many many more segments than others.
> > > > > >>>>>>
> > > > > >>>>>> Is there a way the metric can reflect partition size? Could
> > it be
> > > > > >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> > > > > >>>>>>
> > > > > >>>>>> -James
> > > > > >>>>>>
> > > > > >>>>>> Sent from my iPhone
> > > > > >>>>>>
> > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com>
> > > > wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> Hi all,
> > > > > >>>>>>>
> > > > > >>>>>>> I'd like to propose a KIP to expose a metric for log
> recovery
> > > > > >>> progress.
> > > > > >>>>>>> This metric would let the admins have a way to monitor the
> > log
> > > > > >>> recovery
> > > > > >>>>>>> progress.
> > > > > >>>>>>> Details can be found here:
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > > >>>>>>>
> > > > > >>>>>>> Any feedback is appreciated.
> > > > > >>>>>>>
> > > > > >>>>>>> Thank you.
> > > > > >>>>>>> Luke
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> >
> >
> >
> > --
> > Best Regards,
> > Raman Verma
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi Tom and Raman,

Thanks for your comments.

> 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
2. Similarly the link to this discussion thread needs updating.
> Please update the links to JIRA and the discussion thread.

Yes, thanks for the reminder. I've updated the KIP.

> 3. I wonder whether we need to keep these metrics (with value 0) once the
broker enters the running state. Do you see it as valuable? A benefit of
removing the metrics would be a reduction on storage required for metric
stores which are recording these metrics.

Yes, removing the metrics after log recovery completed is a good idea.
Updated the KIP.

> 4. I think the KIP's public interfaces section could be a bit clearer.
Previous KIPs which added metrics usually used a table, with the MBean
name, metric type and description. SeeKIP-551 for example (or KIP-748,
KIP-608). Similarly you could use a table in the proposed changes section
rather than describing the tree you'd see in an MBean console.

Good point! Updated the KIP to use a table to list the MBean name, metric
type and descriptions.


Thank you.
Luke

On Fri, May 20, 2022 at 9:13 AM Raman Verma <rv...@confluent.io.invalid>
wrote:

> Hi Luke,
>
> The change is useful and simple. Thanks.
> Please update the links to JIRA and the discussion thread.
>
> Best Regards,
> Raman Verma
>
> On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > Hi Luke,
> >
> > Thanks for the KIP. I think the idea makes sense and would provide useful
> > observability of log recovery. I have a few comments.
> >
> > 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> > 2. Similarly the link to this discussion thread needs updating.
> > 3. I wonder whether we need to keep these metrics (with value 0) once the
> > broker enters the running state. Do you see it as valuable? A benefit of
> > removing the metrics would be a reduction on storage required for metric
> > stores which are recording these metrics.
> > 4. I think the KIP's public interfaces section could be a bit clearer.
> > Previous KIPs which added metrics usually used a table, with the MBean
> > name, metric type and description. SeeKIP-551 for example (or KIP-748,
> > KIP-608). Similarly you could use a table in the proposed changes section
> > rather than describing the tree you'd see in an MBean console.
> >
> > Kind regards,
> >
> > Tom
> >
> > On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com> wrote:
> >
> > > > And if people start using RemainingLogs and RemainingSegments and
> then
> > > REALLY FEEL like they need RemainingBytes, then we can always add it
> in the
> > > future.
> > >
> > > +1
> > >
> > > Thanks James!
> > > Luke
> > >
> > > On Wed, May 11, 2022 at 3:57 PM James Cheng <wu...@gmail.com>
> wrote:
> > >
> > > > Hi Luke,
> > > >
> > > > Thanks for the detailed explanation. I agree that the current
> proposal of
> > > > RemainingLogs and RemainingSegments will greatly improve the
> situation,
> > > and
> > > > that we can go ahead with the KIP as is.
> > > >
> > > > If RemainingBytes were straight-forward to implement, then I’d like
> to
> > > > have it. But we can live without it for now. And if people start
> using
> > > > RemainingLogs and RemainingSegments and then REALLY FEEL like they
> need
> > > > RemainingBytes, then we can always add it in the future.
> > > >
> > > > Thanks Luke, for the detailed explanation, and for responding to my
> > > > feedback!
> > > >
> > > > -James
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com> wrote:
> > > > >
> > > > > Hi James and all,
> > > > >
> > > > > I checked again and I can see when creating UnifiedLog, we
> expected the
> > > > > logs/indexes/snapshots are in good state.
> > > > > So, I don't think we should break the current design to expose the
> > > > > `RemainingBytesToRecovery`
> > > > > metric.
> > > > >
> > > > > If there is no other comments, I'll start a vote within this week.
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com>
> wrote:
> > > > >>
> > > > >> Hi James,
> > > > >>
> > > > >> Thanks for your input.
> > > > >>
> > > > >> For the `RemainingBytesToRecovery` metric proposal, I think
> there's
> > > one
> > > > >> thing I didn't make it clear.
> > > > >> Currently, when log manager start up, we'll try to load all logs
> > > > >> (segments), and during the log loading, we'll try to recover logs
> if
> > > > >> necessary.
> > > > >> And the logs loading is using "thread pool" as you thought.
> > > > >>
> > > > >> So, here's the problem:
> > > > >> All segments in each log folder (partition) will be loaded in
> each log
> > > > >> recovery thread, and until it's loaded, we can know how many
> segments
> > > > (or
> > > > >> how many Bytes) needed to recover.
> > > > >> That means, if we have 10 partition logs in one broker, and we
> have 2
> > > > log
> > > > >> recovery threads (num.recovery.threads.per.data.dir=2), before the
> > > > >> threads load the segments in each log, we only know how many logs
> > > > >> (partitions) we have in the broker (i.e. RemainingLogsToRecover
> > > metric).
> > > > >> We cannot know how many segments/Bytes needed to recover until
> each
> > > > thread
> > > > >> starts to load the segments under one log (partition).
> > > > >>
> > > > >> So, the example in the KIP, it shows:
> > > > >> Currently, there are still 5 logs (partitions) needed to recover
> under
> > > > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one
> > > thread
> > > > has
> > > > >> 10000 segments needed to recover, and the other one has 3 segments
> > > > needed
> > > > >> to recover.
> > > > >>
> > > > >>   - kafka.log
> > > > >>      - LogManager
> > > > >>         - RemainingLogsToRecover
> > > > >>            - /tmp/log1 => 5            ← there are 5 logs under
> > > > >>            /tmp/log1 needed to be recovered
> > > > >>            - /tmp/log2 => 0
> > > > >>         - RemainingSegmentsToRecover
> > > > >>            - /tmp/log1                     ← 2 threads are doing
> log
> > > > >>            recovery for /tmp/log1
> > > > >>            - 0 => 10000         ← there are 10000 segments needed
> to
> > > be
> > > > >>               recovered for thread 0
> > > > >>               - 1 => 3
> > > > >>               - /tmp/log2
> > > > >>               - 0 => 0
> > > > >>               - 1 => 0
> > > > >>
> > > > >>
> > > > >> So, after a while, the metrics might look like this:
> > > > >> It said, now, there are only 4 logs needed to recover in
> /tmp/log1,
> > > and
> > > > >> the thread 0 has 9000 segments left, and thread 1 has 5 segments
> left
> > > > >> (which should imply the thread already completed 2 logs recovery
> in
> > > the
> > > > >> period)
> > > > >>
> > > > >>   - kafka.log
> > > > >>      - LogManager
> > > > >>         - RemainingLogsToRecover
> > > > >>            - /tmp/log1 => 3            ← there are 3 logs under
> > > > >>            /tmp/log1 needed to be recovered
> > > > >>            - /tmp/log2 => 0
> > > > >>         - RemainingSegmentsToRecover
> > > > >>            - /tmp/log1                     ← 2 threads are doing
> log
> > > > >>            recovery for /tmp/log1
> > > > >>            - 0 => 9000         ← there are 9000 segments needed
> to be
> > > > >>               recovered for thread 0
> > > > >>               - 1 => 5
> > > > >>               - /tmp/log2
> > > > >>               - 0 => 0
> > > > >>               - 1 => 0
> > > > >>
> > > > >>
> > > > >> That said, the `RemainingBytesToRecovery` metric is difficult to
> > > achieve
> > > > >> as you expected. I think the current proposal with
> > > > `RemainingLogsToRecover`
> > > > >> and `RemainingSegmentsToRecover` should already provide enough
> info
> > > for
> > > > >> the log recovery progress.
> > > > >>
> > > > >> I've also updated the KIP example to make it clear.
> > > > >>
> > > > >>
> > > > >> Thank you.
> > > > >> Luke
> > > > >>
> > > > >>
> > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wushujames@gmail.com
> >
> > > > wrote:
> > > > >>>
> > > > >>> Hi Luke,
> > > > >>>
> > > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > > >>>
> > > > >>> Another thought: different topics can have different segment
> sizes. I
> > > > >>> don't know how common it is, but it is possible. Some topics
> might
> > > want
> > > > >>> small segment sizes to more granular expiration of data.
> > > > >>>
> > > > >>> The downside of RemainingLogsToRecovery and
> > > RemainingSegmentsToRecovery
> > > > >>> is that the rate that they will decrement depends on the
> > > configuration
> > > > and
> > > > >>> patterns of the topics and partitions and segment sizes. If
> someone
> > > is
> > > > >>> monitoring those metrics, they might see times where the metric
> > > > decrements
> > > > >>> slowly, followed by a burst where it decrements quickly.
> > > > >>>
> > > > >>> What about RemainingBytesToRecovery? This would not depend on the
> > > > >>> configuration of the topic or of the data. It would actually be a
> > > > pretty
> > > > >>> good metric, because I think that this metric would change at a
> > > > constant
> > > > >>> rate (based on the disk I/O speed that the broker allocates to
> > > > recovery).
> > > > >>> Because it changes at a constant rate, you would be able to use
> the
> > > > >>> rate-of-change to predict when it hits zero, which will let you
> know
> > > > when
> > > > >>> the broker is going to start up. Like, I would imagine if we
> graphed
> > > > >>> RemainingBytesToRecovery that we'd see a fairly straight line
> that is
> > > > >>> decrementing at a steady rate towards zero.
> > > > >>>
> > > > >>> What do you think about adding RemainingBytesToRecovery?
> > > > >>>
> > > > >>> Or, what would you think about making the primary metric be
> > > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > > >>>
> > > > >>> I don't know if I personally would rather have all 3 metrics, or
> > > would
> > > > >>> just use RemainingBytesToRecovery. I'd too would like more
> community
> > > > input
> > > > >>> on which of those metrics would be useful to people.
> > > > >>>
> > > > >>> About the JMX metrics, you said that if
> > > > >>> num.recovery.threads.per.data.dir=2, that there might be a
> separate
> > > > >>> RemainingSegmentsToRecovery counter for each thread. Is that
> actually
> > > > how
> > > > >>> the data is structured within the Kafka recovery threads? Does
> each
> > > > thread
> > > > >>> get a fixed set of partitions, or is there just one big pool of
> > > > partitions
> > > > >>> that the threads all work on?
> > > > >>>
> > > > >>> As a more concrete example:
> > > > >>> * If I have 9 small partitions and 1 big partition, and
> > > > >>> num.recovery.threads.per.data.dir=2
> > > > >>> Does each thread get 5 partitions, which means one thread will
> finish
> > > > >>> much sooner than the other?
> > > > >>> OR
> > > > >>> Do both threads just work on the set of 10 partitions, which
> means
> > > > likely
> > > > >>> 1 thread will be busy with the big partition, while the other one
> > > ends
> > > > up
> > > > >>> plowing through the 9 small partitions?
> > > > >>>
> > > > >>> If each thread gets assigned 5 partitions, then it would make
> sense
> > > > that
> > > > >>> each thread has its own counter.
> > > > >>> If the threads works on a single pool of 10 partitions, then it
> would
> > > > >>> probably mean that the counter is on the pool of partitions
> itself,
> > > > and not
> > > > >>> on each thread.
> > > > >>>
> > > > >>> -James
> > > > >>>
> > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com>
> wrote:
> > > > >>>>
> > > > >>>> Hi devs,
> > > > >>>>
> > > > >>>> If there are no other comments, I'll start a vote tomorrow.
> > > > >>>>
> > > > >>>> Thank you.
> > > > >>>> Luke
> > > > >>>>
> > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com>
> wrote:
> > > > >>>>
> > > > >>>>> Hi James,
> > > > >>>>>
> > > > >>>>> Sorry for the late reply.
> > > > >>>>>
> > > > >>>>> Yes, this is a good point, to know how many segments to be
> > > recovered
> > > > if
> > > > >>>>> there are some large partitions.
> > > > >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*`
> > > metric
> > > > >>> for
> > > > >>>>> each log recovery thread, to show the value.
> > > > >>>>> The example in the Proposed section here
> > > > >>>>> <
> > > > >>>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > > >>>>
> > > > >>>>> shows what it will look like.
> > > > >>>>>
> > > > >>>>> Thanks for the suggestion.
> > > > >>>>>
> > > > >>>>> Thank you.
> > > > >>>>> Luke
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <
> wushujames@gmail.com>
> > > > >>> wrote:
> > > > >>>>>
> > > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be
> the
> > > > >>> number
> > > > >>>>>> of partitions in each log.dir.
> > > > >>>>>>
> > > > >>>>>> We have some partitions which are much much larger than
> others.
> > > > Those
> > > > >>>>>> large partitions have many many more segments than others.
> > > > >>>>>>
> > > > >>>>>> Is there a way the metric can reflect partition size? Could
> it be
> > > > >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> > > > >>>>>>
> > > > >>>>>> -James
> > > > >>>>>>
> > > > >>>>>> Sent from my iPhone
> > > > >>>>>>
> > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com>
> > > wrote:
> > > > >>>>>>>
> > > > >>>>>>> Hi all,
> > > > >>>>>>>
> > > > >>>>>>> I'd like to propose a KIP to expose a metric for log recovery
> > > > >>> progress.
> > > > >>>>>>> This metric would let the admins have a way to monitor the
> log
> > > > >>> recovery
> > > > >>>>>>> progress.
> > > > >>>>>>> Details can be found here:
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > > >>>>>>>
> > > > >>>>>>> Any feedback is appreciated.
> > > > >>>>>>>
> > > > >>>>>>> Thank you.
> > > > >>>>>>> Luke
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>>
> > > >
> > >
>
>
>
> --
> Best Regards,
> Raman Verma
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Raman Verma <rv...@confluent.io.INVALID>.

Hi Luke,

The change is useful and simple. Thanks.
Please update the links to JIRA and the discussion thread.

Best Regards,
Raman Verma

On Thu, May 19, 2022 at 8:57 AM Tom Bentley <tb...@redhat.com> wrote:
>
> Hi Luke,
>
> Thanks for the KIP. I think the idea makes sense and would provide useful
> observability of log recovery. I have a few comments.
>
> 1. There's not a JIRA for this KIP (or the JIRA link needs updating).
> 2. Similarly the link to this discussion thread needs updating.
> 3. I wonder whether we need to keep these metrics (with value 0) once the
> broker enters the running state. Do you see it as valuable? A benefit of
> removing the metrics would be a reduction on storage required for metric
> stores which are recording these metrics.
> 4. I think the KIP's public interfaces section could be a bit clearer.
> Previous KIPs which added metrics usually used a table, with the MBean
> name, metric type and description. SeeKIP-551 for example (or KIP-748,
> KIP-608). Similarly you could use a table in the proposed changes section
> rather than describing the tree you'd see in an MBean console.
>
> Kind regards,
>
> Tom
>
> On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com> wrote:
>
> > > And if people start using RemainingLogs and RemainingSegments and then
> > REALLY FEEL like they need RemainingBytes, then we can always add it in the
> > future.
> >
> > +1
> >
> > Thanks James!
> > Luke
> >
> > On Wed, May 11, 2022 at 3:57 PM James Cheng <wu...@gmail.com> wrote:
> >
> > > Hi Luke,
> > >
> > > Thanks for the detailed explanation. I agree that the current proposal of
> > > RemainingLogs and RemainingSegments will greatly improve the situation,
> > and
> > > that we can go ahead with the KIP as is.
> > >
> > > If RemainingBytes were straight-forward to implement, then I’d like to
> > > have it. But we can live without it for now. And if people start using
> > > RemainingLogs and RemainingSegments and then REALLY FEEL like they need
> > > RemainingBytes, then we can always add it in the future.
> > >
> > > Thanks Luke, for the detailed explanation, and for responding to my
> > > feedback!
> > >
> > > -James
> > >
> > > Sent from my iPhone
> > >
> > > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com> wrote:
> > > >
> > > > Hi James and all,
> > > >
> > > > I checked again and I can see when creating UnifiedLog, we expected the
> > > > logs/indexes/snapshots are in good state.
> > > > So, I don't think we should break the current design to expose the
> > > > `RemainingBytesToRecovery`
> > > > metric.
> > > >
> > > > If there is no other comments, I'll start a vote within this week.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com> wrote:
> > > >>
> > > >> Hi James,
> > > >>
> > > >> Thanks for your input.
> > > >>
> > > >> For the `RemainingBytesToRecovery` metric proposal, I think there's
> > one
> > > >> thing I didn't make it clear.
> > > >> Currently, when log manager start up, we'll try to load all logs
> > > >> (segments), and during the log loading, we'll try to recover logs if
> > > >> necessary.
> > > >> And the logs loading is using "thread pool" as you thought.
> > > >>
> > > >> So, here's the problem:
> > > >> All segments in each log folder (partition) will be loaded in each log
> > > >> recovery thread, and until it's loaded, we can know how many segments
> > > (or
> > > >> how many Bytes) needed to recover.
> > > >> That means, if we have 10 partition logs in one broker, and we have 2
> > > log
> > > >> recovery threads (num.recovery.threads.per.data.dir=2), before the
> > > >> threads load the segments in each log, we only know how many logs
> > > >> (partitions) we have in the broker (i.e. RemainingLogsToRecover
> > metric).
> > > >> We cannot know how many segments/Bytes needed to recover until each
> > > thread
> > > >> starts to load the segments under one log (partition).
> > > >>
> > > >> So, the example in the KIP, it shows:
> > > >> Currently, there are still 5 logs (partitions) needed to recover under
> > > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one
> > thread
> > > has
> > > >> 10000 segments needed to recover, and the other one has 3 segments
> > > needed
> > > >> to recover.
> > > >>
> > > >>   - kafka.log
> > > >>      - LogManager
> > > >>         - RemainingLogsToRecover
> > > >>            - /tmp/log1 => 5            ← there are 5 logs under
> > > >>            /tmp/log1 needed to be recovered
> > > >>            - /tmp/log2 => 0
> > > >>         - RemainingSegmentsToRecover
> > > >>            - /tmp/log1                     ← 2 threads are doing log
> > > >>            recovery for /tmp/log1
> > > >>            - 0 => 10000         ← there are 10000 segments needed to
> > be
> > > >>               recovered for thread 0
> > > >>               - 1 => 3
> > > >>               - /tmp/log2
> > > >>               - 0 => 0
> > > >>               - 1 => 0
> > > >>
> > > >>
> > > >> So, after a while, the metrics might look like this:
> > > >> It said, now, there are only 4 logs needed to recover in /tmp/log1,
> > and
> > > >> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
> > > >> (which should imply the thread already completed 2 logs recovery in
> > the
> > > >> period)
> > > >>
> > > >>   - kafka.log
> > > >>      - LogManager
> > > >>         - RemainingLogsToRecover
> > > >>            - /tmp/log1 => 3            ← there are 3 logs under
> > > >>            /tmp/log1 needed to be recovered
> > > >>            - /tmp/log2 => 0
> > > >>         - RemainingSegmentsToRecover
> > > >>            - /tmp/log1                     ← 2 threads are doing log
> > > >>            recovery for /tmp/log1
> > > >>            - 0 => 9000         ← there are 9000 segments needed to be
> > > >>               recovered for thread 0
> > > >>               - 1 => 5
> > > >>               - /tmp/log2
> > > >>               - 0 => 0
> > > >>               - 1 => 0
> > > >>
> > > >>
> > > >> That said, the `RemainingBytesToRecovery` metric is difficult to
> > achieve
> > > >> as you expected. I think the current proposal with
> > > `RemainingLogsToRecover`
> > > >> and `RemainingSegmentsToRecover` should already provide enough info
> > for
> > > >> the log recovery progress.
> > > >>
> > > >> I've also updated the KIP example to make it clear.
> > > >>
> > > >>
> > > >> Thank you.
> > > >> Luke
> > > >>
> > > >>
> > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wu...@gmail.com>
> > > wrote:
> > > >>>
> > > >>> Hi Luke,
> > > >>>
> > > >>> Thanks for adding RemainingSegmentsToRecovery.
> > > >>>
> > > >>> Another thought: different topics can have different segment sizes. I
> > > >>> don't know how common it is, but it is possible. Some topics might
> > want
> > > >>> small segment sizes to more granular expiration of data.
> > > >>>
> > > >>> The downside of RemainingLogsToRecovery and
> > RemainingSegmentsToRecovery
> > > >>> is that the rate that they will decrement depends on the
> > configuration
> > > and
> > > >>> patterns of the topics and partitions and segment sizes. If someone
> > is
> > > >>> monitoring those metrics, they might see times where the metric
> > > decrements
> > > >>> slowly, followed by a burst where it decrements quickly.
> > > >>>
> > > >>> What about RemainingBytesToRecovery? This would not depend on the
> > > >>> configuration of the topic or of the data. It would actually be a
> > > pretty
> > > >>> good metric, because I think that this metric would change at a
> > > constant
> > > >>> rate (based on the disk I/O speed that the broker allocates to
> > > recovery).
> > > >>> Because it changes at a constant rate, you would be able to use the
> > > >>> rate-of-change to predict when it hits zero, which will let you know
> > > when
> > > >>> the broker is going to start up. Like, I would imagine if we graphed
> > > >>> RemainingBytesToRecovery that we'd see a fairly straight line that is
> > > >>> decrementing at a steady rate towards zero.
> > > >>>
> > > >>> What do you think about adding RemainingBytesToRecovery?
> > > >>>
> > > >>> Or, what would you think about making the primary metric be
> > > >>> RemainingBytesToRecovery, and getting rid of the others?
> > > >>>
> > > >>> I don't know if I personally would rather have all 3 metrics, or
> > would
> > > >>> just use RemainingBytesToRecovery. I'd too would like more community
> > > input
> > > >>> on which of those metrics would be useful to people.
> > > >>>
> > > >>> About the JMX metrics, you said that if
> > > >>> num.recovery.threads.per.data.dir=2, that there might be a separate
> > > >>> RemainingSegmentsToRecovery counter for each thread. Is that actually
> > > how
> > > >>> the data is structured within the Kafka recovery threads? Does each
> > > thread
> > > >>> get a fixed set of partitions, or is there just one big pool of
> > > partitions
> > > >>> that the threads all work on?
> > > >>>
> > > >>> As a more concrete example:
> > > >>> * If I have 9 small partitions and 1 big partition, and
> > > >>> num.recovery.threads.per.data.dir=2
> > > >>> Does each thread get 5 partitions, which means one thread will finish
> > > >>> much sooner than the other?
> > > >>> OR
> > > >>> Do both threads just work on the set of 10 partitions, which means
> > > likely
> > > >>> 1 thread will be busy with the big partition, while the other one
> > ends
> > > up
> > > >>> plowing through the 9 small partitions?
> > > >>>
> > > >>> If each thread gets assigned 5 partitions, then it would make sense
> > > that
> > > >>> each thread has its own counter.
> > > >>> If the threads works on a single pool of 10 partitions, then it would
> > > >>> probably mean that the counter is on the pool of partitions itself,
> > > and not
> > > >>> on each thread.
> > > >>>
> > > >>> -James
> > > >>>
> > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
> > > >>>>
> > > >>>> Hi devs,
> > > >>>>
> > > >>>> If there are no other comments, I'll start a vote tomorrow.
> > > >>>>
> > > >>>> Thank you.
> > > >>>> Luke
> > > >>>>
> > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
> > > >>>>
> > > >>>>> Hi James,
> > > >>>>>
> > > >>>>> Sorry for the late reply.
> > > >>>>>
> > > >>>>> Yes, this is a good point, to know how many segments to be
> > recovered
> > > if
> > > >>>>> there are some large partitions.
> > > >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*`
> > metric
> > > >>> for
> > > >>>>> each log recovery thread, to show the value.
> > > >>>>> The example in the Proposed section here
> > > >>>>> <
> > > >>>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > > >>>>
> > > >>>>> shows what it will look like.
> > > >>>>>
> > > >>>>> Thanks for the suggestion.
> > > >>>>>
> > > >>>>> Thank you.
> > > >>>>> Luke
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the
> > > >>> number
> > > >>>>>> of partitions in each log.dir.
> > > >>>>>>
> > > >>>>>> We have some partitions which are much much larger than others.
> > > Those
> > > >>>>>> large partitions have many many more segments than others.
> > > >>>>>>
> > > >>>>>> Is there a way the metric can reflect partition size? Could it be
> > > >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> > > >>>>>>
> > > >>>>>> -James
> > > >>>>>>
> > > >>>>>> Sent from my iPhone
> > > >>>>>>
> > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com>
> > wrote:
> > > >>>>>>>
> > > >>>>>>> Hi all,
> > > >>>>>>>
> > > >>>>>>> I'd like to propose a KIP to expose a metric for log recovery
> > > >>> progress.
> > > >>>>>>> This metric would let the admins have a way to monitor the log
> > > >>> recovery
> > > >>>>>>> progress.
> > > >>>>>>> Details can be found here:
> > > >>>>>>>
> > > >>>>>>
> > > >>>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > > >>>>>>>
> > > >>>>>>> Any feedback is appreciated.
> > > >>>>>>>
> > > >>>>>>> Thank you.
> > > >>>>>>> Luke
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>>
> > >
> >



-- 
Best Regards,
Raman Verma

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Tom Bentley <tb...@redhat.com>.

Hi Luke,

Thanks for the KIP. I think the idea makes sense and would provide useful
observability of log recovery. I have a few comments.

1. There's not a JIRA for this KIP (or the JIRA link needs updating).
2. Similarly the link to this discussion thread needs updating.
3. I wonder whether we need to keep these metrics (with value 0) once the
broker enters the running state. Do you see it as valuable? A benefit of
removing the metrics would be a reduction on storage required for metric
stores which are recording these metrics.
4. I think the KIP's public interfaces section could be a bit clearer.
Previous KIPs which added metrics usually used a table, with the MBean
name, metric type and description. SeeKIP-551 for example (or KIP-748,
KIP-608). Similarly you could use a table in the proposed changes section
rather than describing the tree you'd see in an MBean console.

Kind regards,

Tom

On Wed, 11 May 2022 at 09:08, Luke Chen <sh...@gmail.com> wrote:

> > And if people start using RemainingLogs and RemainingSegments and then
> REALLY FEEL like they need RemainingBytes, then we can always add it in the
> future.
>
> +1
>
> Thanks James!
> Luke
>
> On Wed, May 11, 2022 at 3:57 PM James Cheng <wu...@gmail.com> wrote:
>
> > Hi Luke,
> >
> > Thanks for the detailed explanation. I agree that the current proposal of
> > RemainingLogs and RemainingSegments will greatly improve the situation,
> and
> > that we can go ahead with the KIP as is.
> >
> > If RemainingBytes were straight-forward to implement, then I’d like to
> > have it. But we can live without it for now. And if people start using
> > RemainingLogs and RemainingSegments and then REALLY FEEL like they need
> > RemainingBytes, then we can always add it in the future.
> >
> > Thanks Luke, for the detailed explanation, and for responding to my
> > feedback!
> >
> > -James
> >
> > Sent from my iPhone
> >
> > > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com> wrote:
> > >
> > > Hi James and all,
> > >
> > > I checked again and I can see when creating UnifiedLog, we expected the
> > > logs/indexes/snapshots are in good state.
> > > So, I don't think we should break the current design to expose the
> > > `RemainingBytesToRecovery`
> > > metric.
> > >
> > > If there is no other comments, I'll start a vote within this week.
> > >
> > > Thank you.
> > > Luke
> > >
> > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com> wrote:
> > >>
> > >> Hi James,
> > >>
> > >> Thanks for your input.
> > >>
> > >> For the `RemainingBytesToRecovery` metric proposal, I think there's
> one
> > >> thing I didn't make it clear.
> > >> Currently, when log manager start up, we'll try to load all logs
> > >> (segments), and during the log loading, we'll try to recover logs if
> > >> necessary.
> > >> And the logs loading is using "thread pool" as you thought.
> > >>
> > >> So, here's the problem:
> > >> All segments in each log folder (partition) will be loaded in each log
> > >> recovery thread, and until it's loaded, we can know how many segments
> > (or
> > >> how many Bytes) needed to recover.
> > >> That means, if we have 10 partition logs in one broker, and we have 2
> > log
> > >> recovery threads (num.recovery.threads.per.data.dir=2), before the
> > >> threads load the segments in each log, we only know how many logs
> > >> (partitions) we have in the broker (i.e. RemainingLogsToRecover
> metric).
> > >> We cannot know how many segments/Bytes needed to recover until each
> > thread
> > >> starts to load the segments under one log (partition).
> > >>
> > >> So, the example in the KIP, it shows:
> > >> Currently, there are still 5 logs (partitions) needed to recover under
> > >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one
> thread
> > has
> > >> 10000 segments needed to recover, and the other one has 3 segments
> > needed
> > >> to recover.
> > >>
> > >>   - kafka.log
> > >>      - LogManager
> > >>         - RemainingLogsToRecover
> > >>            - /tmp/log1 => 5            ← there are 5 logs under
> > >>            /tmp/log1 needed to be recovered
> > >>            - /tmp/log2 => 0
> > >>         - RemainingSegmentsToRecover
> > >>            - /tmp/log1                     ← 2 threads are doing log
> > >>            recovery for /tmp/log1
> > >>            - 0 => 10000         ← there are 10000 segments needed to
> be
> > >>               recovered for thread 0
> > >>               - 1 => 3
> > >>               - /tmp/log2
> > >>               - 0 => 0
> > >>               - 1 => 0
> > >>
> > >>
> > >> So, after a while, the metrics might look like this:
> > >> It said, now, there are only 4 logs needed to recover in /tmp/log1,
> and
> > >> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
> > >> (which should imply the thread already completed 2 logs recovery in
> the
> > >> period)
> > >>
> > >>   - kafka.log
> > >>      - LogManager
> > >>         - RemainingLogsToRecover
> > >>            - /tmp/log1 => 3            ← there are 3 logs under
> > >>            /tmp/log1 needed to be recovered
> > >>            - /tmp/log2 => 0
> > >>         - RemainingSegmentsToRecover
> > >>            - /tmp/log1                     ← 2 threads are doing log
> > >>            recovery for /tmp/log1
> > >>            - 0 => 9000         ← there are 9000 segments needed to be
> > >>               recovered for thread 0
> > >>               - 1 => 5
> > >>               - /tmp/log2
> > >>               - 0 => 0
> > >>               - 1 => 0
> > >>
> > >>
> > >> That said, the `RemainingBytesToRecovery` metric is difficult to
> achieve
> > >> as you expected. I think the current proposal with
> > `RemainingLogsToRecover`
> > >> and `RemainingSegmentsToRecover` should already provide enough info
> for
> > >> the log recovery progress.
> > >>
> > >> I've also updated the KIP example to make it clear.
> > >>
> > >>
> > >> Thank you.
> > >> Luke
> > >>
> > >>
> > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wu...@gmail.com>
> > wrote:
> > >>>
> > >>> Hi Luke,
> > >>>
> > >>> Thanks for adding RemainingSegmentsToRecovery.
> > >>>
> > >>> Another thought: different topics can have different segment sizes. I
> > >>> don't know how common it is, but it is possible. Some topics might
> want
> > >>> small segment sizes to more granular expiration of data.
> > >>>
> > >>> The downside of RemainingLogsToRecovery and
> RemainingSegmentsToRecovery
> > >>> is that the rate that they will decrement depends on the
> configuration
> > and
> > >>> patterns of the topics and partitions and segment sizes. If someone
> is
> > >>> monitoring those metrics, they might see times where the metric
> > decrements
> > >>> slowly, followed by a burst where it decrements quickly.
> > >>>
> > >>> What about RemainingBytesToRecovery? This would not depend on the
> > >>> configuration of the topic or of the data. It would actually be a
> > pretty
> > >>> good metric, because I think that this metric would change at a
> > constant
> > >>> rate (based on the disk I/O speed that the broker allocates to
> > recovery).
> > >>> Because it changes at a constant rate, you would be able to use the
> > >>> rate-of-change to predict when it hits zero, which will let you know
> > when
> > >>> the broker is going to start up. Like, I would imagine if we graphed
> > >>> RemainingBytesToRecovery that we'd see a fairly straight line that is
> > >>> decrementing at a steady rate towards zero.
> > >>>
> > >>> What do you think about adding RemainingBytesToRecovery?
> > >>>
> > >>> Or, what would you think about making the primary metric be
> > >>> RemainingBytesToRecovery, and getting rid of the others?
> > >>>
> > >>> I don't know if I personally would rather have all 3 metrics, or
> would
> > >>> just use RemainingBytesToRecovery. I'd too would like more community
> > input
> > >>> on which of those metrics would be useful to people.
> > >>>
> > >>> About the JMX metrics, you said that if
> > >>> num.recovery.threads.per.data.dir=2, that there might be a separate
> > >>> RemainingSegmentsToRecovery counter for each thread. Is that actually
> > how
> > >>> the data is structured within the Kafka recovery threads? Does each
> > thread
> > >>> get a fixed set of partitions, or is there just one big pool of
> > partitions
> > >>> that the threads all work on?
> > >>>
> > >>> As a more concrete example:
> > >>> * If I have 9 small partitions and 1 big partition, and
> > >>> num.recovery.threads.per.data.dir=2
> > >>> Does each thread get 5 partitions, which means one thread will finish
> > >>> much sooner than the other?
> > >>> OR
> > >>> Do both threads just work on the set of 10 partitions, which means
> > likely
> > >>> 1 thread will be busy with the big partition, while the other one
> ends
> > up
> > >>> plowing through the 9 small partitions?
> > >>>
> > >>> If each thread gets assigned 5 partitions, then it would make sense
> > that
> > >>> each thread has its own counter.
> > >>> If the threads works on a single pool of 10 partitions, then it would
> > >>> probably mean that the counter is on the pool of partitions itself,
> > and not
> > >>> on each thread.
> > >>>
> > >>> -James
> > >>>
> > >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
> > >>>>
> > >>>> Hi devs,
> > >>>>
> > >>>> If there are no other comments, I'll start a vote tomorrow.
> > >>>>
> > >>>> Thank you.
> > >>>> Luke
> > >>>>
> > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
> > >>>>
> > >>>>> Hi James,
> > >>>>>
> > >>>>> Sorry for the late reply.
> > >>>>>
> > >>>>> Yes, this is a good point, to know how many segments to be
> recovered
> > if
> > >>>>> there are some large partitions.
> > >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*`
> metric
> > >>> for
> > >>>>> each log recovery thread, to show the value.
> > >>>>> The example in the Proposed section here
> > >>>>> <
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> > >>>>
> > >>>>> shows what it will look like.
> > >>>>>
> > >>>>> Thanks for the suggestion.
> > >>>>>
> > >>>>> Thank you.
> > >>>>> Luke
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the
> > >>> number
> > >>>>>> of partitions in each log.dir.
> > >>>>>>
> > >>>>>> We have some partitions which are much much larger than others.
> > Those
> > >>>>>> large partitions have many many more segments than others.
> > >>>>>>
> > >>>>>> Is there a way the metric can reflect partition size? Could it be
> > >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> > >>>>>>
> > >>>>>> -James
> > >>>>>>
> > >>>>>> Sent from my iPhone
> > >>>>>>
> > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com>
> wrote:
> > >>>>>>>
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> I'd like to propose a KIP to expose a metric for log recovery
> > >>> progress.
> > >>>>>>> This metric would let the admins have a way to monitor the log
> > >>> recovery
> > >>>>>>> progress.
> > >>>>>>> Details can be found here:
> > >>>>>>>
> > >>>>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> > >>>>>>>
> > >>>>>>> Any feedback is appreciated.
> > >>>>>>>
> > >>>>>>> Thank you.
> > >>>>>>> Luke
> > >>>>>>
> > >>>>>
> > >>>
> > >>>
> >
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

> And if people start using RemainingLogs and RemainingSegments and then
REALLY FEEL like they need RemainingBytes, then we can always add it in the
future.

+1

Thanks James!
Luke

On Wed, May 11, 2022 at 3:57 PM James Cheng <wu...@gmail.com> wrote:

> Hi Luke,
>
> Thanks for the detailed explanation. I agree that the current proposal of
> RemainingLogs and RemainingSegments will greatly improve the situation, and
> that we can go ahead with the KIP as is.
>
> If RemainingBytes were straight-forward to implement, then I’d like to
> have it. But we can live without it for now. And if people start using
> RemainingLogs and RemainingSegments and then REALLY FEEL like they need
> RemainingBytes, then we can always add it in the future.
>
> Thanks Luke, for the detailed explanation, and for responding to my
> feedback!
>
> -James
>
> Sent from my iPhone
>
> > On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com> wrote:
> >
> > Hi James and all,
> >
> > I checked again and I can see when creating UnifiedLog, we expected the
> > logs/indexes/snapshots are in good state.
> > So, I don't think we should break the current design to expose the
> > `RemainingBytesToRecovery`
> > metric.
> >
> > If there is no other comments, I'll start a vote within this week.
> >
> > Thank you.
> > Luke
> >
> >> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com> wrote:
> >>
> >> Hi James,
> >>
> >> Thanks for your input.
> >>
> >> For the `RemainingBytesToRecovery` metric proposal, I think there's one
> >> thing I didn't make it clear.
> >> Currently, when log manager start up, we'll try to load all logs
> >> (segments), and during the log loading, we'll try to recover logs if
> >> necessary.
> >> And the logs loading is using "thread pool" as you thought.
> >>
> >> So, here's the problem:
> >> All segments in each log folder (partition) will be loaded in each log
> >> recovery thread, and until it's loaded, we can know how many segments
> (or
> >> how many Bytes) needed to recover.
> >> That means, if we have 10 partition logs in one broker, and we have 2
> log
> >> recovery threads (num.recovery.threads.per.data.dir=2), before the
> >> threads load the segments in each log, we only know how many logs
> >> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric).
> >> We cannot know how many segments/Bytes needed to recover until each
> thread
> >> starts to load the segments under one log (partition).
> >>
> >> So, the example in the KIP, it shows:
> >> Currently, there are still 5 logs (partitions) needed to recover under
> >> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread
> has
> >> 10000 segments needed to recover, and the other one has 3 segments
> needed
> >> to recover.
> >>
> >>   - kafka.log
> >>      - LogManager
> >>         - RemainingLogsToRecover
> >>            - /tmp/log1 => 5            ← there are 5 logs under
> >>            /tmp/log1 needed to be recovered
> >>            - /tmp/log2 => 0
> >>         - RemainingSegmentsToRecover
> >>            - /tmp/log1                     ← 2 threads are doing log
> >>            recovery for /tmp/log1
> >>            - 0 => 10000         ← there are 10000 segments needed to be
> >>               recovered for thread 0
> >>               - 1 => 3
> >>               - /tmp/log2
> >>               - 0 => 0
> >>               - 1 => 0
> >>
> >>
> >> So, after a while, the metrics might look like this:
> >> It said, now, there are only 4 logs needed to recover in /tmp/log1, and
> >> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
> >> (which should imply the thread already completed 2 logs recovery in the
> >> period)
> >>
> >>   - kafka.log
> >>      - LogManager
> >>         - RemainingLogsToRecover
> >>            - /tmp/log1 => 3            ← there are 3 logs under
> >>            /tmp/log1 needed to be recovered
> >>            - /tmp/log2 => 0
> >>         - RemainingSegmentsToRecover
> >>            - /tmp/log1                     ← 2 threads are doing log
> >>            recovery for /tmp/log1
> >>            - 0 => 9000         ← there are 9000 segments needed to be
> >>               recovered for thread 0
> >>               - 1 => 5
> >>               - /tmp/log2
> >>               - 0 => 0
> >>               - 1 => 0
> >>
> >>
> >> That said, the `RemainingBytesToRecovery` metric is difficult to achieve
> >> as you expected. I think the current proposal with
> `RemainingLogsToRecover`
> >> and `RemainingSegmentsToRecover` should already provide enough info for
> >> the log recovery progress.
> >>
> >> I've also updated the KIP example to make it clear.
> >>
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wu...@gmail.com>
> wrote:
> >>>
> >>> Hi Luke,
> >>>
> >>> Thanks for adding RemainingSegmentsToRecovery.
> >>>
> >>> Another thought: different topics can have different segment sizes. I
> >>> don't know how common it is, but it is possible. Some topics might want
> >>> small segment sizes to more granular expiration of data.
> >>>
> >>> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery
> >>> is that the rate that they will decrement depends on the configuration
> and
> >>> patterns of the topics and partitions and segment sizes. If someone is
> >>> monitoring those metrics, they might see times where the metric
> decrements
> >>> slowly, followed by a burst where it decrements quickly.
> >>>
> >>> What about RemainingBytesToRecovery? This would not depend on the
> >>> configuration of the topic or of the data. It would actually be a
> pretty
> >>> good metric, because I think that this metric would change at a
> constant
> >>> rate (based on the disk I/O speed that the broker allocates to
> recovery).
> >>> Because it changes at a constant rate, you would be able to use the
> >>> rate-of-change to predict when it hits zero, which will let you know
> when
> >>> the broker is going to start up. Like, I would imagine if we graphed
> >>> RemainingBytesToRecovery that we'd see a fairly straight line that is
> >>> decrementing at a steady rate towards zero.
> >>>
> >>> What do you think about adding RemainingBytesToRecovery?
> >>>
> >>> Or, what would you think about making the primary metric be
> >>> RemainingBytesToRecovery, and getting rid of the others?
> >>>
> >>> I don't know if I personally would rather have all 3 metrics, or would
> >>> just use RemainingBytesToRecovery. I'd too would like more community
> input
> >>> on which of those metrics would be useful to people.
> >>>
> >>> About the JMX metrics, you said that if
> >>> num.recovery.threads.per.data.dir=2, that there might be a separate
> >>> RemainingSegmentsToRecovery counter for each thread. Is that actually
> how
> >>> the data is structured within the Kafka recovery threads? Does each
> thread
> >>> get a fixed set of partitions, or is there just one big pool of
> partitions
> >>> that the threads all work on?
> >>>
> >>> As a more concrete example:
> >>> * If I have 9 small partitions and 1 big partition, and
> >>> num.recovery.threads.per.data.dir=2
> >>> Does each thread get 5 partitions, which means one thread will finish
> >>> much sooner than the other?
> >>> OR
> >>> Do both threads just work on the set of 10 partitions, which means
> likely
> >>> 1 thread will be busy with the big partition, while the other one ends
> up
> >>> plowing through the 9 small partitions?
> >>>
> >>> If each thread gets assigned 5 partitions, then it would make sense
> that
> >>> each thread has its own counter.
> >>> If the threads works on a single pool of 10 partitions, then it would
> >>> probably mean that the counter is on the pool of partitions itself,
> and not
> >>> on each thread.
> >>>
> >>> -James
> >>>
> >>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
> >>>>
> >>>> Hi devs,
> >>>>
> >>>> If there are no other comments, I'll start a vote tomorrow.
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
> >>>>
> >>>>> Hi James,
> >>>>>
> >>>>> Sorry for the late reply.
> >>>>>
> >>>>> Yes, this is a good point, to know how many segments to be recovered
> if
> >>>>> there are some large partitions.
> >>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric
> >>> for
> >>>>> each log recovery thread, to show the value.
> >>>>> The example in the Proposed section here
> >>>>> <
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> >>>>
> >>>>> shows what it will look like.
> >>>>>
> >>>>> Thanks for the suggestion.
> >>>>>
> >>>>> Thank you.
> >>>>> Luke
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the
> >>> number
> >>>>>> of partitions in each log.dir.
> >>>>>>
> >>>>>> We have some partitions which are much much larger than others.
> Those
> >>>>>> large partitions have many many more segments than others.
> >>>>>>
> >>>>>> Is there a way the metric can reflect partition size? Could it be
> >>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> >>>>>>
> >>>>>> -James
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to propose a KIP to expose a metric for log recovery
> >>> progress.
> >>>>>>> This metric would let the admins have a way to monitor the log
> >>> recovery
> >>>>>>> progress.
> >>>>>>> Details can be found here:
> >>>>>>>
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> >>>>>>>
> >>>>>>> Any feedback is appreciated.
> >>>>>>>
> >>>>>>> Thank you.
> >>>>>>> Luke
> >>>>>>
> >>>>>
> >>>
> >>>
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by James Cheng <wu...@gmail.com>.

Hi Luke,

Thanks for the detailed explanation. I agree that the current proposal of RemainingLogs and RemainingSegments will greatly improve the situation, and that we can go ahead with the KIP as is.

If RemainingBytes were straight-forward to implement, then I’d like to have it. But we can live without it for now. And if people start using RemainingLogs and RemainingSegments and then REALLY FEEL like they need RemainingBytes, then we can always add it in the future.

Thanks Luke, for the detailed explanation, and for responding to my feedback!

-James

Sent from my iPhone

> On May 10, 2022, at 6:48 AM, Luke Chen <sh...@gmail.com> wrote:
> 
> Hi James and all,
> 
> I checked again and I can see when creating UnifiedLog, we expected the
> logs/indexes/snapshots are in good state.
> So, I don't think we should break the current design to expose the
> `RemainingBytesToRecovery`
> metric.
> 
> If there is no other comments, I'll start a vote within this week.
> 
> Thank you.
> Luke
> 
>> On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com> wrote:
>> 
>> Hi James,
>> 
>> Thanks for your input.
>> 
>> For the `RemainingBytesToRecovery` metric proposal, I think there's one
>> thing I didn't make it clear.
>> Currently, when log manager start up, we'll try to load all logs
>> (segments), and during the log loading, we'll try to recover logs if
>> necessary.
>> And the logs loading is using "thread pool" as you thought.
>> 
>> So, here's the problem:
>> All segments in each log folder (partition) will be loaded in each log
>> recovery thread, and until it's loaded, we can know how many segments (or
>> how many Bytes) needed to recover.
>> That means, if we have 10 partition logs in one broker, and we have 2 log
>> recovery threads (num.recovery.threads.per.data.dir=2), before the
>> threads load the segments in each log, we only know how many logs
>> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric).
>> We cannot know how many segments/Bytes needed to recover until each thread
>> starts to load the segments under one log (partition).
>> 
>> So, the example in the KIP, it shows:
>> Currently, there are still 5 logs (partitions) needed to recover under
>> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has
>> 10000 segments needed to recover, and the other one has 3 segments needed
>> to recover.
>> 
>>   - kafka.log
>>      - LogManager
>>         - RemainingLogsToRecover
>>            - /tmp/log1 => 5            ← there are 5 logs under
>>            /tmp/log1 needed to be recovered
>>            - /tmp/log2 => 0
>>         - RemainingSegmentsToRecover
>>            - /tmp/log1                     ← 2 threads are doing log
>>            recovery for /tmp/log1
>>            - 0 => 10000         ← there are 10000 segments needed to be
>>               recovered for thread 0
>>               - 1 => 3
>>               - /tmp/log2
>>               - 0 => 0
>>               - 1 => 0
>> 
>> 
>> So, after a while, the metrics might look like this:
>> It said, now, there are only 4 logs needed to recover in /tmp/log1, and
>> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
>> (which should imply the thread already completed 2 logs recovery in the
>> period)
>> 
>>   - kafka.log
>>      - LogManager
>>         - RemainingLogsToRecover
>>            - /tmp/log1 => 3            ← there are 3 logs under
>>            /tmp/log1 needed to be recovered
>>            - /tmp/log2 => 0
>>         - RemainingSegmentsToRecover
>>            - /tmp/log1                     ← 2 threads are doing log
>>            recovery for /tmp/log1
>>            - 0 => 9000         ← there are 9000 segments needed to be
>>               recovered for thread 0
>>               - 1 => 5
>>               - /tmp/log2
>>               - 0 => 0
>>               - 1 => 0
>> 
>> 
>> That said, the `RemainingBytesToRecovery` metric is difficult to achieve
>> as you expected. I think the current proposal with `RemainingLogsToRecover`
>> and `RemainingSegmentsToRecover` should already provide enough info for
>> the log recovery progress.
>> 
>> I've also updated the KIP example to make it clear.
>> 
>> 
>> Thank you.
>> Luke
>> 
>> 
>>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wu...@gmail.com> wrote:
>>> 
>>> Hi Luke,
>>> 
>>> Thanks for adding RemainingSegmentsToRecovery.
>>> 
>>> Another thought: different topics can have different segment sizes. I
>>> don't know how common it is, but it is possible. Some topics might want
>>> small segment sizes to more granular expiration of data.
>>> 
>>> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery
>>> is that the rate that they will decrement depends on the configuration and
>>> patterns of the topics and partitions and segment sizes. If someone is
>>> monitoring those metrics, they might see times where the metric decrements
>>> slowly, followed by a burst where it decrements quickly.
>>> 
>>> What about RemainingBytesToRecovery? This would not depend on the
>>> configuration of the topic or of the data. It would actually be a pretty
>>> good metric, because I think that this metric would change at a constant
>>> rate (based on the disk I/O speed that the broker allocates to recovery).
>>> Because it changes at a constant rate, you would be able to use the
>>> rate-of-change to predict when it hits zero, which will let you know when
>>> the broker is going to start up. Like, I would imagine if we graphed
>>> RemainingBytesToRecovery that we'd see a fairly straight line that is
>>> decrementing at a steady rate towards zero.
>>> 
>>> What do you think about adding RemainingBytesToRecovery?
>>> 
>>> Or, what would you think about making the primary metric be
>>> RemainingBytesToRecovery, and getting rid of the others?
>>> 
>>> I don't know if I personally would rather have all 3 metrics, or would
>>> just use RemainingBytesToRecovery. I'd too would like more community input
>>> on which of those metrics would be useful to people.
>>> 
>>> About the JMX metrics, you said that if
>>> num.recovery.threads.per.data.dir=2, that there might be a separate
>>> RemainingSegmentsToRecovery counter for each thread. Is that actually how
>>> the data is structured within the Kafka recovery threads? Does each thread
>>> get a fixed set of partitions, or is there just one big pool of partitions
>>> that the threads all work on?
>>> 
>>> As a more concrete example:
>>> * If I have 9 small partitions and 1 big partition, and
>>> num.recovery.threads.per.data.dir=2
>>> Does each thread get 5 partitions, which means one thread will finish
>>> much sooner than the other?
>>> OR
>>> Do both threads just work on the set of 10 partitions, which means likely
>>> 1 thread will be busy with the big partition, while the other one ends up
>>> plowing through the 9 small partitions?
>>> 
>>> If each thread gets assigned 5 partitions, then it would make sense that
>>> each thread has its own counter.
>>> If the threads works on a single pool of 10 partitions, then it would
>>> probably mean that the counter is on the pool of partitions itself, and not
>>> on each thread.
>>> 
>>> -James
>>> 
>>>> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
>>>> 
>>>> Hi devs,
>>>> 
>>>> If there are no other comments, I'll start a vote tomorrow.
>>>> 
>>>> Thank you.
>>>> Luke
>>>> 
>>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
>>>> 
>>>>> Hi James,
>>>>> 
>>>>> Sorry for the late reply.
>>>>> 
>>>>> Yes, this is a good point, to know how many segments to be recovered if
>>>>> there are some large partitions.
>>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric
>>> for
>>>>> each log recovery thread, to show the value.
>>>>> The example in the Proposed section here
>>>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
>>>> 
>>>>> shows what it will look like.
>>>>> 
>>>>> Thanks for the suggestion.
>>>>> 
>>>>> Thank you.
>>>>> Luke
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the
>>> number
>>>>>> of partitions in each log.dir.
>>>>>> 
>>>>>> We have some partitions which are much much larger than others. Those
>>>>>> large partitions have many many more segments than others.
>>>>>> 
>>>>>> Is there a way the metric can reflect partition size? Could it be
>>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>>>>>> 
>>>>>> -James
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'd like to propose a KIP to expose a metric for log recovery
>>> progress.
>>>>>>> This metric would let the admins have a way to monitor the log
>>> recovery
>>>>>>> progress.
>>>>>>> Details can be found here:
>>>>>>> 
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
>>>>>>> 
>>>>>>> Any feedback is appreciated.
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> Luke
>>>>>> 
>>>>> 
>>> 
>>>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi James and all,

I checked again and I can see when creating UnifiedLog, we expected the
logs/indexes/snapshots are in good state.
So, I don't think we should break the current design to expose the
`RemainingBytesToRecovery`
metric.

If there is no other comments, I'll start a vote within this week.

Thank you.
Luke

On Fri, May 6, 2022 at 6:00 PM Luke Chen <sh...@gmail.com> wrote:

> Hi James,
>
> Thanks for your input.
>
> For the `RemainingBytesToRecovery` metric proposal, I think there's one
> thing I didn't make it clear.
> Currently, when log manager start up, we'll try to load all logs
> (segments), and during the log loading, we'll try to recover logs if
> necessary.
> And the logs loading is using "thread pool" as you thought.
>
> So, here's the problem:
> All segments in each log folder (partition) will be loaded in each log
> recovery thread, and until it's loaded, we can know how many segments (or
> how many Bytes) needed to recover.
> That means, if we have 10 partition logs in one broker, and we have 2 log
> recovery threads (num.recovery.threads.per.data.dir=2), before the
> threads load the segments in each log, we only know how many logs
> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric).
> We cannot know how many segments/Bytes needed to recover until each thread
> starts to load the segments under one log (partition).
>
> So, the example in the KIP, it shows:
> Currently, there are still 5 logs (partitions) needed to recover under
> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has
> 10000 segments needed to recover, and the other one has 3 segments needed
> to recover.
>
>    - kafka.log
>       - LogManager
>          - RemainingLogsToRecover
>             - /tmp/log1 => 5            ← there are 5 logs under
>             /tmp/log1 needed to be recovered
>             - /tmp/log2 => 0
>          - RemainingSegmentsToRecover
>             - /tmp/log1                     ← 2 threads are doing log
>             recovery for /tmp/log1
>             - 0 => 10000         ← there are 10000 segments needed to be
>                recovered for thread 0
>                - 1 => 3
>                - /tmp/log2
>                - 0 => 0
>                - 1 => 0
>
>
> So, after a while, the metrics might look like this:
> It said, now, there are only 4 logs needed to recover in /tmp/log1, and
> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
> (which should imply the thread already completed 2 logs recovery in the
> period)
>
>    - kafka.log
>       - LogManager
>          - RemainingLogsToRecover
>             - /tmp/log1 => 3            ← there are 3 logs under
>             /tmp/log1 needed to be recovered
>             - /tmp/log2 => 0
>          - RemainingSegmentsToRecover
>             - /tmp/log1                     ← 2 threads are doing log
>             recovery for /tmp/log1
>             - 0 => 9000         ← there are 9000 segments needed to be
>                recovered for thread 0
>                - 1 => 5
>                - /tmp/log2
>                - 0 => 0
>                - 1 => 0
>
>
> That said, the `RemainingBytesToRecovery` metric is difficult to achieve
> as you expected. I think the current proposal with `RemainingLogsToRecover`
> and `RemainingSegmentsToRecover` should already provide enough info for
> the log recovery progress.
>
> I've also updated the KIP example to make it clear.
>
>
> Thank you.
> Luke
>
>
> On Thu, May 5, 2022 at 3:31 AM James Cheng <wu...@gmail.com> wrote:
>
>> Hi Luke,
>>
>> Thanks for adding RemainingSegmentsToRecovery.
>>
>> Another thought: different topics can have different segment sizes. I
>> don't know how common it is, but it is possible. Some topics might want
>> small segment sizes to more granular expiration of data.
>>
>> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery
>> is that the rate that they will decrement depends on the configuration and
>> patterns of the topics and partitions and segment sizes. If someone is
>> monitoring those metrics, they might see times where the metric decrements
>> slowly, followed by a burst where it decrements quickly.
>>
>> What about RemainingBytesToRecovery? This would not depend on the
>> configuration of the topic or of the data. It would actually be a pretty
>> good metric, because I think that this metric would change at a constant
>> rate (based on the disk I/O speed that the broker allocates to recovery).
>> Because it changes at a constant rate, you would be able to use the
>> rate-of-change to predict when it hits zero, which will let you know when
>> the broker is going to start up. Like, I would imagine if we graphed
>> RemainingBytesToRecovery that we'd see a fairly straight line that is
>> decrementing at a steady rate towards zero.
>>
>> What do you think about adding RemainingBytesToRecovery?
>>
>> Or, what would you think about making the primary metric be
>> RemainingBytesToRecovery, and getting rid of the others?
>>
>> I don't know if I personally would rather have all 3 metrics, or would
>> just use RemainingBytesToRecovery. I'd too would like more community input
>> on which of those metrics would be useful to people.
>>
>> About the JMX metrics, you said that if
>> num.recovery.threads.per.data.dir=2, that there might be a separate
>> RemainingSegmentsToRecovery counter for each thread. Is that actually how
>> the data is structured within the Kafka recovery threads? Does each thread
>> get a fixed set of partitions, or is there just one big pool of partitions
>> that the threads all work on?
>>
>> As a more concrete example:
>> * If I have 9 small partitions and 1 big partition, and
>> num.recovery.threads.per.data.dir=2
>> Does each thread get 5 partitions, which means one thread will finish
>> much sooner than the other?
>> OR
>> Do both threads just work on the set of 10 partitions, which means likely
>> 1 thread will be busy with the big partition, while the other one ends up
>> plowing through the 9 small partitions?
>>
>> If each thread gets assigned 5 partitions, then it would make sense that
>> each thread has its own counter.
>> If the threads works on a single pool of 10 partitions, then it would
>> probably mean that the counter is on the pool of partitions itself, and not
>> on each thread.
>>
>> -James
>>
>> > On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
>> >
>> > Hi devs,
>> >
>> > If there are no other comments, I'll start a vote tomorrow.
>> >
>> > Thank you.
>> > Luke
>> >
>> > On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
>> >
>> >> Hi James,
>> >>
>> >> Sorry for the late reply.
>> >>
>> >> Yes, this is a good point, to know how many segments to be recovered if
>> >> there are some large partitions.
>> >> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric
>> for
>> >> each log recovery thread, to show the value.
>> >> The example in the Proposed section here
>> >> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
>> >
>> >> shows what it will look like.
>> >>
>> >> Thanks for the suggestion.
>> >>
>> >> Thank you.
>> >> Luke
>> >>
>> >>
>> >>
>> >> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com>
>> wrote:
>> >>
>> >>> The KIP describes RemainingLogsToRecovery, which seems to be the
>> number
>> >>> of partitions in each log.dir.
>> >>>
>> >>> We have some partitions which are much much larger than others. Those
>> >>> large partitions have many many more segments than others.
>> >>>
>> >>> Is there a way the metric can reflect partition size? Could it be
>> >>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>> >>>
>> >>> -James
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> I'd like to propose a KIP to expose a metric for log recovery
>> progress.
>> >>>> This metric would let the admins have a way to monitor the log
>> recovery
>> >>>> progress.
>> >>>> Details can be found here:
>> >>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
>> >>>>
>> >>>> Any feedback is appreciated.
>> >>>>
>> >>>> Thank you.
>> >>>> Luke
>> >>>
>> >>
>>
>>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi James,

Thanks for your input.

For the `RemainingBytesToRecovery` metric proposal, I think there's one
thing I didn't make it clear.
Currently, when log manager start up, we'll try to load all logs
(segments), and during the log loading, we'll try to recover logs if
necessary.
And the logs loading is using "thread pool" as you thought.

So, here's the problem:
All segments in each log folder (partition) will be loaded in each log
recovery thread, and until it's loaded, we can know how many segments (or
how many Bytes) needed to recover.
That means, if we have 10 partition logs in one broker, and we have 2 log
recovery threads (num.recovery.threads.per.data.dir=2), before the threads
load the segments in each log, we only know how many logs (partitions) we
have in the broker (i.e. RemainingLogsToRecover metric). We cannot know how
many segments/Bytes needed to recover until each thread starts to load the
segments under one log (partition).

So, the example in the KIP, it shows:
Currently, there are still 5 logs (partitions) needed to recover under
/tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has
10000 segments needed to recover, and the other one has 3 segments needed
to recover.

   - kafka.log
      - LogManager
         - RemainingLogsToRecover
            - /tmp/log1 => 5            ← there are 5 logs under /tmp/log1
            needed to be recovered
            - /tmp/log2 => 0
         - RemainingSegmentsToRecover
            - /tmp/log1                     ← 2 threads are doing log
            recovery for /tmp/log1
            - 0 => 10000         ← there are 10000 segments needed to be
               recovered for thread 0
               - 1 => 3
               - /tmp/log2
               - 0 => 0
               - 1 => 0


So, after a while, the metrics might look like this:
It said, now, there are only 4 logs needed to recover in /tmp/log1, and the
thread 0 has 9000 segments left, and thread 1 has 5 segments left (which
should imply the thread already completed 2 logs recovery in the period)

   - kafka.log
      - LogManager
         - RemainingLogsToRecover
            - /tmp/log1 => 3            ← there are 3 logs under /tmp/log1
            needed to be recovered
            - /tmp/log2 => 0
         - RemainingSegmentsToRecover
            - /tmp/log1                     ← 2 threads are doing log
            recovery for /tmp/log1
            - 0 => 9000         ← there are 9000 segments needed to be
               recovered for thread 0
               - 1 => 5
               - /tmp/log2
               - 0 => 0
               - 1 => 0


That said, the `RemainingBytesToRecovery` metric is difficult to achieve as
you expected. I think the current proposal with `RemainingLogsToRecover`
and `RemainingSegmentsToRecover` should already provide enough info for the
log recovery progress.

I've also updated the KIP example to make it clear.


Thank you.
Luke


On Thu, May 5, 2022 at 3:31 AM James Cheng <wu...@gmail.com> wrote:

> Hi Luke,
>
> Thanks for adding RemainingSegmentsToRecovery.
>
> Another thought: different topics can have different segment sizes. I
> don't know how common it is, but it is possible. Some topics might want
> small segment sizes to more granular expiration of data.
>
> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery is
> that the rate that they will decrement depends on the configuration and
> patterns of the topics and partitions and segment sizes. If someone is
> monitoring those metrics, they might see times where the metric decrements
> slowly, followed by a burst where it decrements quickly.
>
> What about RemainingBytesToRecovery? This would not depend on the
> configuration of the topic or of the data. It would actually be a pretty
> good metric, because I think that this metric would change at a constant
> rate (based on the disk I/O speed that the broker allocates to recovery).
> Because it changes at a constant rate, you would be able to use the
> rate-of-change to predict when it hits zero, which will let you know when
> the broker is going to start up. Like, I would imagine if we graphed
> RemainingBytesToRecovery that we'd see a fairly straight line that is
> decrementing at a steady rate towards zero.
>
> What do you think about adding RemainingBytesToRecovery?
>
> Or, what would you think about making the primary metric be
> RemainingBytesToRecovery, and getting rid of the others?
>
> I don't know if I personally would rather have all 3 metrics, or would
> just use RemainingBytesToRecovery. I'd too would like more community input
> on which of those metrics would be useful to people.
>
> About the JMX metrics, you said that if
> num.recovery.threads.per.data.dir=2, that there might be a separate
> RemainingSegmentsToRecovery counter for each thread. Is that actually how
> the data is structured within the Kafka recovery threads? Does each thread
> get a fixed set of partitions, or is there just one big pool of partitions
> that the threads all work on?
>
> As a more concrete example:
> * If I have 9 small partitions and 1 big partition, and
> num.recovery.threads.per.data.dir=2
> Does each thread get 5 partitions, which means one thread will finish much
> sooner than the other?
> OR
> Do both threads just work on the set of 10 partitions, which means likely
> 1 thread will be busy with the big partition, while the other one ends up
> plowing through the 9 small partitions?
>
> If each thread gets assigned 5 partitions, then it would make sense that
> each thread has its own counter.
> If the threads works on a single pool of 10 partitions, then it would
> probably mean that the counter is on the pool of partitions itself, and not
> on each thread.
>
> -James
>
> > On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
> >
> > Hi devs,
> >
> > If there are no other comments, I'll start a vote tomorrow.
> >
> > Thank you.
> > Luke
> >
> > On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
> >
> >> Hi James,
> >>
> >> Sorry for the late reply.
> >>
> >> Yes, this is a good point, to know how many segments to be recovered if
> >> there are some large partitions.
> >> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric for
> >> each log recovery thread, to show the value.
> >> The example in the Proposed section here
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
> >
> >> shows what it will look like.
> >>
> >> Thanks for the suggestion.
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >>
> >> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com>
> wrote:
> >>
> >>> The KIP describes RemainingLogsToRecovery, which seems to be the number
> >>> of partitions in each log.dir.
> >>>
> >>> We have some partitions which are much much larger than others. Those
> >>> large partitions have many many more segments than others.
> >>>
> >>> Is there a way the metric can reflect partition size? Could it be
> >>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
> >>>
> >>> -James
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I'd like to propose a KIP to expose a metric for log recovery
> progress.
> >>>> This metric would let the admins have a way to monitor the log
> recovery
> >>>> progress.
> >>>> Details can be found here:
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
> >>>>
> >>>> Any feedback is appreciated.
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>
> >>
>
>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by James Cheng <wu...@gmail.com>.

Hi Luke,

Thanks for adding RemainingSegmentsToRecovery.

Another thought: different topics can have different segment sizes. I don't know how common it is, but it is possible. Some topics might want small segment sizes to more granular expiration of data.

The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery is that the rate that they will decrement depends on the configuration and patterns of the topics and partitions and segment sizes. If someone is monitoring those metrics, they might see times where the metric decrements slowly, followed by a burst where it decrements quickly.

What about RemainingBytesToRecovery? This would not depend on the configuration of the topic or of the data. It would actually be a pretty good metric, because I think that this metric would change at a constant rate (based on the disk I/O speed that the broker allocates to recovery). Because it changes at a constant rate, you would be able to use the rate-of-change to predict when it hits zero, which will let you know when the broker is going to start up. Like, I would imagine if we graphed RemainingBytesToRecovery that we'd see a fairly straight line that is decrementing at a steady rate towards zero.

What do you think about adding RemainingBytesToRecovery? 

Or, what would you think about making the primary metric be RemainingBytesToRecovery, and getting rid of the others?

I don't know if I personally would rather have all 3 metrics, or would just use RemainingBytesToRecovery. I'd too would like more community input on which of those metrics would be useful to people.

About the JMX metrics, you said that if num.recovery.threads.per.data.dir=2, that there might be a separate RemainingSegmentsToRecovery counter for each thread. Is that actually how the data is structured within the Kafka recovery threads? Does each thread get a fixed set of partitions, or is there just one big pool of partitions that the threads all work on?

As a more concrete example:
* If I have 9 small partitions and 1 big partition, and num.recovery.threads.per.data.dir=2
Does each thread get 5 partitions, which means one thread will finish much sooner than the other?
OR
Do both threads just work on the set of 10 partitions, which means likely 1 thread will be busy with the big partition, while the other one ends up plowing through the 9 small partitions?

If each thread gets assigned 5 partitions, then it would make sense that each thread has its own counter.
If the threads works on a single pool of 10 partitions, then it would probably mean that the counter is on the pool of partitions itself, and not on each thread.

-James

> On May 4, 2022, at 5:55 AM, Luke Chen <sh...@gmail.com> wrote:
> 
> Hi devs,
> 
> If there are no other comments, I'll start a vote tomorrow.
> 
> Thank you.
> Luke
> 
> On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:
> 
>> Hi James,
>> 
>> Sorry for the late reply.
>> 
>> Yes, this is a good point, to know how many segments to be recovered if
>> there are some large partitions.
>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric for
>> each log recovery thread, to show the value.
>> The example in the Proposed section here
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges>
>> shows what it will look like.
>> 
>> Thanks for the suggestion.
>> 
>> Thank you.
>> Luke
>> 
>> 
>> 
>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com> wrote:
>> 
>>> The KIP describes RemainingLogsToRecovery, which seems to be the number
>>> of partitions in each log.dir.
>>> 
>>> We have some partitions which are much much larger than others. Those
>>> large partitions have many many more segments than others.
>>> 
>>> Is there a way the metric can reflect partition size? Could it be
>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>>> 
>>> -James
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I'd like to propose a KIP to expose a metric for log recovery progress.
>>>> This metric would let the admins have a way to monitor the log recovery
>>>> progress.
>>>> Details can be found here:
>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
>>>> 
>>>> Any feedback is appreciated.
>>>> 
>>>> Thank you.
>>>> Luke
>>> 
>>

Re: [DISCUSS] KIP-831: Add metric for log recovery progress

Posted by Luke Chen <sh...@gmail.com>.

Hi devs,

If there are no other comments, I'll start a vote tomorrow.

Thank you.
Luke

On Sun, May 1, 2022 at 5:08 PM Luke Chen <sh...@gmail.com> wrote:

> Hi James,
>
> Sorry for the late reply.
>
> Yes, this is a good point, to know how many segments to be recovered if
> there are some large partitions.
> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric for
> each log recovery thread, to show the value.
> The example in the Proposed section here
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges>
> shows what it will look like.
>
> Thanks for the suggestion.
>
> Thank you.
> Luke
>
>
>
> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wu...@gmail.com> wrote:
>
>> The KIP describes RemainingLogsToRecovery, which seems to be the number
>> of partitions in each log.dir.
>>
>> We have some partitions which are much much larger than others. Those
>> large partitions have many many more segments than others.
>>
>> Is there a way the metric can reflect partition size? Could it be
>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>>
>> -James
>>
>> Sent from my iPhone
>>
>> > On Apr 20, 2022, at 2:01 AM, Luke Chen <sh...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > I'd like to propose a KIP to expose a metric for log recovery progress.
>> > This metric would let the admins have a way to monitor the log recovery
>> > progress.
>> > Details can be found here:
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
>> >
>> > Any feedback is appreciated.
>> >
>> > Thank you.
>> > Luke
>>
>