You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Timea Barna (Jira)" <ji...@apache.org> on 2022/07/21 05:39:00 UTC

[jira] [Updated] (NIFI-10256) CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes

     [ https://issues.apache.org/jira/browse/NIFI-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timea Barna updated NIFI-10256:
-------------------------------
    Description: 
Given an input CSV file:

scenario,name
Honors escape beginning," ""John ""PA""RKINSON"""
problematic,"""John ""PA""RKINSON"""
honors escape end,"""John ""PA""RKINSON"

Based on the RFC 4180 spec:

https://datatracker.ietf.org/doc/html/rfc4180

" If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:

"aaa","b""bb","ccc"
"

The output should be like this:

[

{ "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
,

{ "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
,

{ "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
]

However the output is like this"

[

{ "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
,

{ "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
,

{ "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
]

Notice the "problematic" field which initially is """John ""PA""RKINSON""" and based on the RFC spec it should have returned this value "\"John \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the staring and ending double quotes.

Notice that the other 2 fields expected_remove_end_quote and expected_with_space do work as expected given the RFC spec.

> CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes
> ------------------------------------------------------------------------------------
>
>                 Key: NIFI-10256
>                 URL: https://issues.apache.org/jira/browse/NIFI-10256
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Timea Barna
>            Assignee: Timea Barna
>            Priority: Major
>
> Given an input CSV file:
> scenario,name
> Honors escape beginning," ""John ""PA""RKINSON"""
> problematic,"""John ""PA""RKINSON"""
> honors escape end,"""John ""PA""RKINSON"
> Based on the RFC 4180 spec:
> https://datatracker.ietf.org/doc/html/rfc4180
> " If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> "
> The output should be like this:
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> However the output is like this"
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> Notice the "problematic" field which initially is """John ""PA""RKINSON""" and based on the RFC spec it should have returned this value "\"John \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the staring and ending double quotes.
> Notice that the other 2 fields expected_remove_end_quote and expected_with_space do work as expected given the RFC spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)