You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Randall Hauch (JIRA)" <ji...@apache.org> on 2017/05/22 20:14:04 UTC

[jira] [Comment Edited] (KAFKA-3821) Allow Kafka Connect source tasks to produce offset without writing to topics

    [ https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020100#comment-16020100 ] 

Randall Hauch edited comment on KAFKA-3821 at 5/22/17 8:13 PM:
---------------------------------------------------------------

The problem with the connector directly using {{OffsetStorageWriter}} is that it cannot guarantee order relative to the source records that Kafka Connect is already processing. In my cases, the offset/partition should be updated as part of the sequence of normal source records, and that order must be maintained.

The best and simplest example is a connector that still wants to record that it is still making progress in its source, but for whatever reason is not producing any source records.

But imagine a case where the connector just recorded an offset via {{OffsetStorageWriter}} and then immediately produces a new {{SourceRecord}} with a new offset. This order is important, and it's really bad if the offset of the {{SourceRecord}} gets written before the connector's call. 

Of course, the opposite case is bad, too: imagine the connector producing {{SourceRecord}} that is enqueued and not immediately processed, but the connector progresses a bit and wants to record its new offset. If it did the latter by explicit writing to the {{OffsetStorageWriter}}, that might happen before the offset in the {{SourceRecord}} is captured.

Bottom line is that connectors need to be able to specify the order of {{SourceRecords}} and offset updates, and that likely means they all need to be sent through the same poll mechanism.


was (Author: rhauch):
The problem with the connector directly using {{OffsetStorageWriter}} is that it cannot guarantee order relative to the source records that Kafka Connect is already processing. In my cases, the offset/partition should be updated as part of the sequence of normal source records, and that order must be maintained.

The best and simplest example is a connector that still wants to record that it is still making progress in its source, but for whatever reason is not producing any source records.

> Allow Kafka Connect source tasks to produce offset without writing to topics
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-3821
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3821
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.9.0.1
>            Reporter: Randall Hauch
>              Labels: needs-kip
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a database). Once the task completes those records, the connector wants to update the offsets (e.g., the snapshot is complete) but has no more records to be written to a topic. With this change, the task could simply supply an updated offset.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)