You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/04 03:19:00 UTC

[jira] [Commented] (KAFKA-7704) kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported incorrectly

    [ https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708142#comment-16708142 ] 

ASF GitHub Bot commented on KAFKA-7704:
---------------------------------------

huxihx opened a new pull request #5998: KAFKA-7704: MaxLag.Replica metric is reported incorrectly
URL: https://github.com/apache/kafka/pull/5998
 
 
   On the follower side, for the empty `LogAppendInfo` retrieved from the leader, fetcherLagStats set the wrong lag for fetcherLagStats due to `nextOffset` is zero in this case where it actually means no lagging, so the lag should be set to 0 if `nextOffset` is 0 or `logAppendInfo.lastOffset` is -1.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported incorrectly
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-7704
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7704
>             Project: Kafka
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 2.1.0
>            Reporter: Yu Yang
>            Priority: Major
>         Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png
>
>
> We recently deployed kafka 2.1, and noticed a jump in kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, there is no under-replicated partitions for the cluster. 
> The initial analysis shows that kafka 2.1.0 does not report metric correctly for topics that have no incoming traffic right now, but had traffic earlier. For those topics, ReplicaFetcherManager will consider the maxLag be the latest offset. 
> For instance, we have a topic named `test_topic`: 
> {code}
> [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
> total 8
> -rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 00000000099043947579.index
> -rw-rw-r-- 1 kafka kafka        0 Sep 23 03:01 00000000099043947579.log
> -rw-rw-r-- 1 kafka kafka       10 Dec  4 00:13 00000000099043947579.snapshot
> -rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 00000000099043947579.timeindex
> -rw-rw-r-- 1 kafka kafka        4 Dec  4 00:13 leader-epoch-checkpoint
> {code}
> kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579
>  !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)