You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Chris Egerton (Jira)" <ji...@apache.org> on 2022/07/17 17:45:00 UTC

[jira] [Comment Edited] (KAFKA-14079) Source task develops memory leak if "error.tolerance" is set to "all"

    [ https://issues.apache.org/jira/browse/KAFKA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567700#comment-17567700 ] 

Chris Egerton edited comment on KAFKA-14079 at 7/17/22 5:44 PM:
----------------------------------------------------------------

[~cshannon] is it correct to say that, in addition to leaking resources, another consequence of this bug is that source tasks become unable to commit some or all source offsets? It might be worth updating the title to reflect that since, in addition to the increased memory utilization we'd expect from an ever-growing deque of records, users may also discover this issue by observing that the offsets for a source connector have become stuck on one or more source partitions. Thoughts?


was (Author: chrisegerton):
[~cshannon] is it correct to say that, in addition to leaking resources, another consequence of this bug is that source tasks become unable to commit some or all source offsets? It might be worth updating the title to reflect that since, in addition to the increased memory utilization we'd expect from an ever-growing deque of records, users may also discover this issue by observing that the offsets for a source connector have become stuck on one or more source partitions.

> Source task develops memory leak if "error.tolerance" is set to "all"
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-14079
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14079
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 3.2.0
>            Reporter: Christopher L. Shannon
>            Priority: Major
>             Fix For: 3.2.1
>
>
> KAFKA-13348 added the ability to ignore producer exceptions by setting {{error.tolerance}} to {{{}all{}}}.  When this is set to all a null record metadata is passed to commitRecord() and the task continues.
> The issue is that records are tracked by {{SubmittedRecords }}and the first time an error happens the code does not remove the record with the error from SubmittedRecords before calling commitRecord(). 
> This leads to a memory leak because the algorithm that removes acked records from the internal map [looks |https://github.com/apache/kafka/blob/3.2.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java#L177]at the head of the Deque where the records are tracked in and if it sees the record is unacked it will not process anymore removals. This leads to all new records that go through the task to continue to be added and never removed until an OOM error occurrs.
> The fix is to make sure to remove the failed record before calling commitRecord(). Metrics also need to be updated as well.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)