You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Randall Hauch (Jira)" <ji...@apache.org> on 2020/09/29 00:21:00 UTC

[jira] [Commented] (KAFKA-9546) Make FileStreamSourceTask extendable with generic streams

    [ https://issues.apache.org/jira/browse/KAFKA-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203563#comment-17203563 ] 

Randall Hauch commented on KAFKA-9546:
--------------------------------------

[~galyo], thanks for the suggestion and the PR.

I've added the `needs-kip` label, because the `FileStreamSourceConnector ` is part of the Connect API, even though it is intentionally just an example connector that helps demonstrate Connect. Because a KIP is required, I question whether changing this connector is really worth it. And because these file connectors are the only ones that ship with AK, extending them will undoubtably create issues if you're extension is installed into a different version of AK than the one with which it is compiled.

If you're providing a customized task class, could you not just provide your own `SourceConnector` class? You'd have a lot more control over, and you've have much more freedom to be able to deploy your connector into nearly any version of a Kafka Connect cluster installation. (The only limitation would be which of the Connect APIs you chose to use, such as the use of headers.)

As such, I think it's not worth the complication to the examples nor to your connector to make this change.

> Make FileStreamSourceTask extendable with generic streams
> ---------------------------------------------------------
>
>                 Key: KAFKA-9546
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9546
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Csaba Galyo
>            Assignee: Csaba Galyo
>            Priority: Major
>              Labels: connect-api, needs-kip
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Use case: I want to read a ZIP compressed text file with a file connector and send it to Kafka.
> Currently, we have FileStreamSourceConnector which reads a \n delimited text file. This connector always returns a task of type FileStreamSourceTask.
> The FileStreamSourceTask reads from stdio or opens a file InputStream. The issue with this approach is that the input needs to be a text file, otherwise it won't work. 
> The code should be modified so that users could change the default InputStream to eg. ZipInputStream, or any other format. The code is currently written in such a way that it's not possible to extend it, we cannot use a different input stream. 
> See example here where the code got copy-pasted just so it could read from a ZstdInputStream (which reads ZSTD compressed files): [https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file]
>  
> I suggest 2 changes:
>  # FileStreamSourceConnector should be extendable to return tasks of different types. These types would be input by the user through the configuration map
>  # FileStreamSourceTask should be modified so it could be extended and child classes could define different input streams.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)