You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/05/10 12:11:12 UTC

[GitHub] [flink] dawidwys commented on issue #8387: [FLINK-11982] [table] File System Connector's support JSON Format and JSON file BatchTableSource

dawidwys commented on issue #8387: [FLINK-11982] [table] File System Connector's support JSON Format and JSON file BatchTableSource
URL: https://github.com/apache/flink/pull/8387#issuecomment-491266766
 
 
   Hi @ambition119 
   I agree with @StephanEwen that this PR mixes the concepts of `connector`(file) and `format`(json).
   
   I don't necessarily understand the comment that `JsonRowFormatFactory` does not support File System Connector. First, there is no file system connector, and second of all this is a format factory thus it should have no notion of connector.
   
   I think you could rework your `JsonBatchTableSource` and `JsonRowInputFormat` to accept any `DeserializationSchema<Row>` similar to how `org.apache.flink.streaming.connectors.kafka.KafkaTableSourceBase` works.
   
   Another issue on a more conceptual layer is that the proposed `RowInputFormat` assumes that each line in a file is a separate record. I agree this is probably the most common case, but not the only one. 
   Files can be also written with a different layout (e.g. Parquet), thus we should differentiate that also on the connector level.
   
   I'm guessing you might have been inspired by `CsvTableSource`, but it was one of the first `TableSource`s that was written before we decided to split connectors and formats and its design is flawed. It also uses `RowCsvInputFormat` that as I said in the previous paragraph applies custom block(in this case line) splitting based on configurable delimiter.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services