You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Mikhail Lipkovich (JIRA)" <ji...@apache.org> on 2017/09/01 16:57:00 UTC

[jira] [Commented] (FLINK-6016) Newlines should be valid in quoted strings in CSV

    [ https://issues.apache.org/jira/browse/FLINK-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150827#comment-16150827 ] 

Mikhail Lipkovich commented on FLINK-6016:
------------------------------------------

Thank you for the reply Luke
For now FileInputFormat identifies splits using information about blocks, no data is actually read. If I correctly understand you, the suggestion is to modify this reader so that it downloads all blocks, parses it according to quoted newline characters and returns split boundaries. Therefore the data will be traversed twice: once in a single thread for splits identification and the second one for actual data processing. 
Probably I'm able to implement it but I think it would be better for me to implement few easier tasks before diving into this one.
Please let me know if my understanding of your comment is wrong

> Newlines should be valid in quoted strings in CSV
> -------------------------------------------------
>
>                 Key: FLINK-6016
>                 URL: https://issues.apache.org/jira/browse/FLINK-6016
>             Project: Flink
>          Issue Type: Bug
>          Components: Batch Connectors and Input/Output Formats
>    Affects Versions: 1.2.0
>            Reporter: Luke Hutchison
>
> The RFC for the CSV format specifies that newlines are valid in quoted strings in CSV:
> https://tools.ietf.org/html/rfc4180
> However, when parsing a CSV file with Flink containing a newline, such as:
> {noformat}
> "3
> 4",5
> {noformat}
> you get this exception:
> {noformat}
> Line could not be parsed: '"3'
> ParserError UNTERMINATED_QUOTED_STRING 
> Expect field types: class java.lang.String, class java.lang.String 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)