You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Randall Hauch (Jira)" <ji...@apache.org> on 2020/09/29 15:45:00 UTC

[jira] [Resolved] (KAFKA-9546) Make FileStreamSourceTask extendable with generic streams

     [ https://issues.apache.org/jira/browse/KAFKA-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Hauch resolved KAFKA-9546.
----------------------------------
    Resolution: Won't Fix

I'm going to close this as WONTFIX, per my previous comment.

> Make FileStreamSourceTask extendable with generic streams
> ---------------------------------------------------------
>
>                 Key: KAFKA-9546
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9546
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Csaba Galyo
>            Assignee: Csaba Galyo
>            Priority: Major
>              Labels: connect-api, needs-kip
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Use case: I want to read a ZIP compressed text file with a file connector and send it to Kafka.
> Currently, we have FileStreamSourceConnector which reads a \n delimited text file. This connector always returns a task of type FileStreamSourceTask.
> The FileStreamSourceTask reads from stdio or opens a file InputStream. The issue with this approach is that the input needs to be a text file, otherwise it won't work. 
> The code should be modified so that users could change the default InputStream to eg. ZipInputStream, or any other format. The code is currently written in such a way that it's not possible to extend it, we cannot use a different input stream. 
> See example here where the code got copy-pasted just so it could read from a ZstdInputStream (which reads ZSTD compressed files): [https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file]
>  
> I suggest 2 changes:
>  # FileStreamSourceConnector should be extendable to return tasks of different types. These types would be input by the user through the configuration map
>  # FileStreamSourceTask should be modified so it could be extended and child classes could define different input streams.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)