You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Vincent Giroux (Jira)" <ji...@apache.org> on 2021/10/12 18:34:00 UTC

[jira] [Updated] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

     [ https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vincent Giroux updated KAFKA-13370:
-----------------------------------
    Summary: Offset commit failure percentage metric is not computed correctly (regression)  (was: Offset commit failure percentage incorrect (regression))

> Offset commit failure percentage metric is not computed correctly (regression)
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-13370
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13370
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect, metrics
>    Affects Versions: 2.8.0
>         Environment: Confluent Platform Helm Chart (v6.2.0)
>            Reporter: Vincent Giroux
>            Priority: Minor
>             Fix For: 2.8.0
>
>
> There seems to have been a regression in the way the *offset-commit-*  **   ** metrics are calculated for  *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get correctly flagged as an offset commit failure (i.e the *offset-commit-failure-percentage* metric ** would be non-zero). Since version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced appears to be this one : [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* method of the *WorkerTask* class (argument deemed redundant because of the presence of the Throwable *error* argument) and only considering the presence of a non-null error to determine if a commit is a success or failure might be a mistake. This is because in the *commitOffsets* method of the *WorkerSourceTask* class, there are multiple cases where an exception object is either not available or is not passed to the *recordCommitFailure* method, e.g. :
>  * *TImeout #1* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519] 
>  * *Timeout #2* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584] 
>  * *Interruption* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529] 
>  * *Unserializable offset* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562] 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)