You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by ebru <b2...@cs.hacettepe.edu.tr> on 2017/11/24 10:40:28 UTC

Dataset read csv file problem

Hello all,

We are trying to read csv files which contains fields containing  \n character, also \n character is line delimiter. We used parseQuotedStrings('\"')
 Method but, it ignores only field delimiters so we couldn’t parse the fields that contains \n character. How can we solve this problem?

-Ebru

Re: Dataset read csv file problem

Posted by ebru <b2...@cs.hacettepe.edu.tr>.
Thank you Fabian, we’ve implemented a custom CsvInputFormat.


> On 24 Nov 2017, at 15:35, Fabian Hueske <fh...@gmail.com> wrote:
> 
> Hi Ebru,
> 
> this case is not supported by Flink's CsvInputFormat. The problem is that such a file could not be read in parallel because it is not possible to identify record boundaries if you start reading in the middle of the file.
> We have a new CsvInputFormat under development that follows the RFC 4180 standard which will have an parameter to support row delimiters that are encapsulated in a String field.
> 
> Until that is available, the only solution is to implement a custom InputFormat.
> 
> Best, Fabian
> 
> 2017-11-24 11:40 GMT+01:00 ebru <b20926247@cs.hacettepe.edu.tr <ma...@cs.hacettepe.edu.tr>>:
> Hello all,
> 
> We are trying to read csv files which contains fields containing  \n character, also \n character is line delimiter. We used parseQuotedStrings('\"')
>  Method but, it ignores only field delimiters so we couldn’t parse the fields that contains \n character. How can we solve this problem?
> 
> -Ebru
> 


Re: Dataset read csv file problem

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Ebru,

this case is not supported by Flink's CsvInputFormat. The problem is that
such a file could not be read in parallel because it is not possible to
identify record boundaries if you start reading in the middle of the file.
We have a new CsvInputFormat under development that follows the RFC 4180
standard which will have an parameter to support row delimiters that are
encapsulated in a String field.

Until that is available, the only solution is to implement a custom
InputFormat.

Best, Fabian

2017-11-24 11:40 GMT+01:00 ebru <b2...@cs.hacettepe.edu.tr>:

> Hello all,
>
> We are trying to read csv files which contains fields containing  \n
> character, also \n character is line delimiter. We used
> parseQuotedStrings('\"')
>  Method but, it ignores only field delimiters so we couldn’t parse the
> fields that contains \n character. How can we solve this problem?
>
> -Ebru
>