You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yuanjian Li (JIRA)" <ji...@apache.org> on 2018/11/05 03:44:00 UTC

[jira] [Commented] (SPARK-25937) Support user-defined schema in Kafka Source & Sink

    [ https://issues.apache.org/jira/browse/SPARK-25937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674630#comment-16674630 ] 

Yuanjian Li commented on SPARK-25937:
-------------------------------------

The problem described here need to be resolved,  but FileFormat stuff sounds not the best way to reach this? Do we consider about Encoder or DatasourceV2?

> Support user-defined schema in Kafka Source & Sink
> --------------------------------------------------
>
>                 Key: SPARK-25937
>                 URL: https://issues.apache.org/jira/browse/SPARK-25937
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jackey Lee
>            Priority: Major
>
>     Kafka Source & Sink is widely used in Spark and has the highest frequency in streaming production environment. But at present, both Kafka Source and Link use the fixed schema, which force user to do data conversion when reading and writing Kafka. So why not we use fileformat to do this just like hive?
>     Flink has implemented Kafka's Json/Csv/Avro extended Source & Sink, we can also support it in Spark.
> *Main Goals:*
> 1. Provide a Source and Sink that support user defined Schema. Users can read and write Kafka directly in the program without additional data conversion.
> 2. Provides read-write mechanism based on FileFormat. User's data conversion is similar to FileFormat's read and write process, we can provide a mechanism similar to FileFormat, which provide common read-write format conversion. It also allow users to customize format conversion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org