You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2018/07/16 09:51:00 UTC

[jira] [Commented] (FLINK-9813) Build xTableSource from Avro schemas

    [ https://issues.apache.org/jira/browse/FLINK-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545012#comment-16545012 ] 

Fabian Hueske commented on FLINK-9813:
--------------------------------------

Hi [~flacombe],

I am not sure if I understand the proposal correctly.

IMO, CSV and Avro are two different data formats and serialization schemas. CSV stores rows with a flat schema as plain text by separating values by commas (although our {{CsvTableSource}} also supports different delimiters). Avro supports nested structures and serializes rows in a binary format. Hence, I don't see how we could build a {{CsvTableSource}} that supports Avro.

Are you suggestion a {{TableSource}} that reads Avro files?
Btw. we are currently in the process of separating connectors (file system, Kafka, Kinesis) from formats (Avro, CSV, JSON, ORC, Parquet) to make them easier to combine, i.e., have support for Avro files, by combining the file system connector and the Avro schema (see FLINK-8558)

Best, Fabian


> Build xTableSource from Avro schemas
> ------------------------------------
>
>                 Key: FLINK-9813
>                 URL: https://issues.apache.org/jira/browse/FLINK-9813
>             Project: Flink
>          Issue Type: Wish
>          Components: Table API &amp; SQL
>    Affects Versions: 1.5.0
>            Reporter: François Lacombe
>            Priority: Trivial
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> As Avro provide efficient data schemas formalism, it may be great to be able to build Flink Tables Sources with such files.
> More info about Avro schemas :[https://avro.apache.org/docs/1.8.1/spec.html#schemas]
> For instance, with CsvTableSource :
> Parser schemaParser = new Schema.Parser();
> Schema tableSchema = schemaParser.parse("avro.json");
> Builder bld = CsvTableSource.builder().schema(tableSchema);
>  
> This would give me a fully available CsvTableSource with columns defined in avro.json
> It may be possible to do so for every TableSources since avro format is really common and versatile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)