You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zl (Jira)" <ji...@apache.org> on 2022/03/18 08:21:00 UTC

[jira] [Comment Edited] (FLINK-26722) the result is wrong when using file connector with csv format

    [ https://issues.apache.org/jira/browse/FLINK-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508641#comment-17508641 ] 

zl edited comment on FLINK-26722 at 3/18/22, 8:20 AM:
------------------------------------------------------

I think it has something to do with field parsing.
 
when use CsvTableSource, we use [RowCsvInputFormat|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java] for reading data and [StringParser#parseField|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/types/parser/StringParser.java#L47] for parsing string field. when a string field is empty(""), if emptyColumnAsNull is enabled, [RowCsvInputFormat#L221|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java#L221]will set the field value to null.

when use the new file connector with csv format, we use [CsvReaderFormat|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvReaderFormat.java] for reading data and [CsvToRowDataConverters#convertToString|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvToRowDataConverters.java#L271] for parsing string field. when a string field is empty(""), [CsvToRowDataConverters.java#L109|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvToRowDataConverters.java#L109]will set the field value to empty string ("").
 
I think the way that CsvTableSource treats empty string may be more reasonable, the new file source with csv format should be consistent with it. It means that when *_csv.ignore-parse-errors_* is enabled, [CsvToRowDataConverters#convertToString|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvToRowDataConverters.java#L271] covert empty string ("") to null, otherwise convert  ("") to  ("").


was (Author: leo zhou):
I think it has something to do with field parsing.
 
when use CsvTableSource, we use [RowCsvInputFormat|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java] for reading data and [StringParser#parseField|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/types/parser/StringParser.java#L47] for parsing string field. when a string field is empty(""), if emptyColumnAsNull is enabled, [RowCsvInputFormat#L221|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/RowCsvInputFormat.java#L221]will set the field value to null.

when use the new file connector with csv format, we use [CsvReaderFormat|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvReaderFormat.java] for reading data and [CsvToRowDataConverters#convertToString|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvToRowDataConverters.java#L271] for parsing string field. when a string field is empty(""), [CsvToRowDataConverters.java#L109|https://github.com/apache/flink/blob/master/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvToRowDataConverters.java#L109]will set the field value to empty string ("").
 
I think the way that CsvTableSource treats empty string may be more reasonable, the new file source with csv format should be consistent with it

> the result is wrong when using file connector with csv format
> -------------------------------------------------------------
>
>                 Key: FLINK-26722
>                 URL: https://issues.apache.org/jira/browse/FLINK-26722
>             Project: Flink
>          Issue Type: Bug
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>            Reporter: zl
>            Priority: Major
>         Attachments: CsvTest1.java, example.csv, image-2022-03-18-15-32-28-914.png
>
>
> CsvTest1.java execute a same query on a same dataset (Attachment example.csv) with CsvTableSource and the new file connector respectively,  but the result is different. The results are as follows:
> !image-2022-03-18-15-32-28-914.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)