You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Oleg Kuznetsov (JIRA)" <ji...@apache.org> on 2017/07/17 00:08:01 UTC
[jira] [Comment Edited] (KAFKA-4794) Add access to
OffsetStorageReader from SourceConnector
[ https://issues.apache.org/jira/browse/KAFKA-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089196#comment-16089196 ]
Oleg Kuznetsov edited comment on KAFKA-4794 at 7/17/17 12:07 AM:
-----------------------------------------------------------------
[~jasong35] [~ewencp] Do you have any feedback regarding this KIP?
was (Author: olkuznsmith):
[~jasong35] Do you have any feedback regarding this KIP?
> Add access to OffsetStorageReader from SourceConnector
> ------------------------------------------------------
>
> Key: KAFKA-4794
> URL: https://issues.apache.org/jira/browse/KAFKA-4794
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Affects Versions: 0.10.2.0
> Reporter: Florian Hussonnois
> Priority: Minor
> Labels: needs-kip
>
> Currently the offsets storage is only accessible from SourceTask to able to initialize properly tasks after a restart, a crash or a reconfiguration request.
> To implement more complex connectors that need to track the progression of each task it would helpful to have access to an OffsetStorageReader instance from the SourceConnector.
> In that way, we could have a background thread that could request a tasks reconfiguration based on source offsets.
> This improvement proposal comes from a customer project that needs to periodically scan directories on a shared storage for detecting and for streaming new files into Kafka.
> The connector implementation is pretty straightforward.
> The connector uses a background thread to periodically scan directories. When new inputs files are detected a tasks reconfiguration is requested. Then the connector assigns a file subset to each task.
> Each task stores sources offsets for the last sent record. The source offsets data are:
> - the size of file
> - the bytes offset
> - the bytes size
> Tasks become idle when the assigned files are completed (in : recordBytesOffsets + recordBytesSize = fileBytesSize).
> Then, the connector should be able to track offsets for each assigned file.
> When all tasks has finished the connector can stop them or assigned new files by requesting tasks reconfiguration.
> Moreover, another advantage of monitoring source offsets from the connector is detect slow or failed tasks and if necessary to be able to restart all tasks.
> If you think this improvement is OK, I can work a pull request.
> Thanks,
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)