You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Chris Egerton (Jira)" <ji...@apache.org> on 2020/05/13 18:11:00 UTC

[jira] [Comment Edited] (KAFKA-9982) [kafka-connect] Source connector does not guarantee at least once delivery

    [ https://issues.apache.org/jira/browse/KAFKA-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105942#comment-17105942 ] 

Chris Egerton edited comment on KAFKA-9982 at 5/13/20, 6:10 PM:
----------------------------------------------------------------

The producers the framework uses to write data from source tasks to Kafka are [configured conservatively|https://github.com/apache/kafka/blob/9bc96d54f8953d190a1fb6478a0656f049ee3b32/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L557-L563] to prevent multiple concurrent in-flight requests, which might compromise the ordering of the records.

As of the fix for https://issues.apache.org/jira/browse/KAFKA-8586, the framework will cease processing records from a source task if it fails to send a record to Kafka.

The framework does use an entirely different producer to write source offsets to Kafka, but no offsets are written to Kafka unless the record they correspond to has been ack'd by the broker and safely made it to Kafka.

[~q.xu] based on the source code for the worker, I don't think this analysis is correct. Have you observed this behavior yourself?


was (Author: chrisegerton):
The producers the framework uses to write data from source tasks to Kafka are [configured conservatively|https://github.com/apache/kafka/blob/9bc96d54f8953d190a1fb6478a0656f049ee3b32/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L557-L563] to prevent multiple concurrent in-flight requests, which might compromise the ordering of the records.

As of the fix for https://issues.apache.org/jira/browse/KAFKA-8586, the framework will cease processing records from a source task if it fails to send a record to Kafka.

The framework does use an entirely different producer to write source offsets to Kafka, but no offsets are written to Kafka unless the record they correspond has been ack'd by the broker and safely made it to Kafka.

[~q.xu] based on the source code for the worker, I don't think this analysis is correct. Have you observed this behavior yourself?

> [kafka-connect] Source connector does not guarantee at least once delivery
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-9982
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9982
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 2.5.0
>            Reporter: Qinghui Xu
>            Priority: Major
>
> In kafka-connect runtime, the WorkerSourceTask is responsible for sending records to the destination topics and managing the source offset commit. Committed offsets are then used later for recovery of tasks during rebalance or restart.
> But there are two concerns when looking into the WorkerSourceTask implementation:
>  * When producer fail to send records, there's no retry but just skipping offset commit and then execute next loop (poll for new records)
>  * The offset commit and effectively sending records over network are in fact asynchronous, which means the offset commit could happen before records are received by brokers, and a rebalance/restart in this gap could lead to message loss.
> The conclusion is thus that the source connector does not support at least once semantics by default (without the plugin implementation making extra effort itself). I consider this as a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)