You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Vincent Giroux (Jira)" <ji...@apache.org> on 2021/10/12 18:34:00 UTC
[jira] [Updated] (KAFKA-13370) Offset commit failure percentage
metric is not computed correctly (regression)
[ https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vincent Giroux updated KAFKA-13370:
-----------------------------------
Summary: Offset commit failure percentage metric is not computed correctly (regression) (was: Offset commit failure percentage incorrect (regression))
> Offset commit failure percentage metric is not computed correctly (regression)
> ------------------------------------------------------------------------------
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect, metrics
> Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
> Reporter: Vincent Giroux
> Priority: Minor
> Fix For: 2.8.0
>
>
> There seems to have been a regression in the way the *offset-commit-* ** ** metrics are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get correctly flagged as an offset commit failure (i.e the *offset-commit-failure-percentage* metric ** would be non-zero). Since version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced appears to be this one : [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* method of the *WorkerTask* class (argument deemed redundant because of the presence of the Throwable *error* argument) and only considering the presence of a non-null error to determine if a commit is a success or failure might be a mistake. This is because in the *commitOffsets* method of the *WorkerSourceTask* class, there are multiple cases where an exception object is either not available or is not passed to the *recordCommitFailure* method, e.g. :
> * *TImeout #1* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
> * *Timeout #2* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
> * *Interruption* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
> * *Unserializable offset* : [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)