You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "François Lacombe (JIRA)" <ji...@apache.org> on 2018/07/16 14:51:00 UTC

[jira] [Comment Edited] (FLINK-9814) CsvTableSource "lack of column" warning

    [ https://issues.apache.org/jira/browse/FLINK-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545270#comment-16545270 ] 

François Lacombe edited comment on FLINK-9814 at 7/16/18 2:50 PM:
------------------------------------------------------------------

Fabien,

My comments on each point below :

1) No, because at the main() method run, the input files may be not known (especially with streaming processes)

2) Maybe, depending on what IO overhead it implies according to what you say

3) Yes it can work. For CSV files and other flat formats, header knowledge is mandatory

The check should ensure that the input file is conform to what structure we expect.
 In CsvTableSource, we use to declare what field should be in the file. I want to get an Exception when any input file doesn't have one of those fields.

Depending on the format, it may be possible to check types, but not by checking each row which may imply a lot of processing.

Example :

Builder src_builder = CsvTableSource.builder().path(path);
 src_builder.field("col1", Types.INT());
 src_builder.field("col2", Types.STRING());
 src_builder.field("col3", Types.STRING());

We except a CSV file with 3 columns.
 Then, if something else comes in input :

++
||col1||col2||col4||
|Col A1|Col A2|blabla|

Exception : where is col3 ?

 
  
 All the best


was (Author: flacombe):
Fabien,

My comments on each point below :

1) No, because at the main() method exception the input files may be not known (especially with streaming processes)

2) Maybe, depending on what IO overhead it implies according to what you say

3) Yes it can work. For CSV files and other flat formats, header knowledge is mandatory

The check should ensure that the input file is conform to what structure we expect.
 In CsvTableSource, we use to declare what field should be in the file. I want to get an Exception when any input file doesn't have one of those fields.

Depending on the format, it may be possible to check types, but not by checking each row which may imply a lot of processing.

Example :

Builder src_builder = CsvTableSource.builder().path(path);
 src_builder.field("col1", Types.INT());
 src_builder.field("col2", Types.STRING());
 src_builder.field("col3", Types.STRING());

We except a CSV file with 3 columns.
 Then, if something else comes in input :

++
||col1||col2||col4||
|Col A1|Col A2|blabla|
Exception : where is col3 ?

 
 
All the best

> CsvTableSource "lack of column" warning
> ---------------------------------------
>
>                 Key: FLINK-9814
>                 URL: https://issues.apache.org/jira/browse/FLINK-9814
>             Project: Flink
>          Issue Type: Wish
>          Components: Table API &amp; SQL
>    Affects Versions: 1.5.0
>            Reporter: François Lacombe
>            Assignee: vinoyang
>            Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The CsvTableSource class is built by defining expected columns to be find in the corresponding csv file.
>  
> It would be great to throw an Exception when the csv file doesn't have the same structure as defined in the source. For retro-compatibility sake, developers should explicitly set the builder to define columns stricly and expect Exception to be thrown in case of structure difference.
> It can be easilly checked with file header if it exists.
> Is this possible ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)