You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "David Handermann (Jira)" <ji...@apache.org> on 2022/11/09 18:51:00 UTC

[jira] [Updated] (NIFI-10256) CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes

     [ https://issues.apache.org/jira/browse/NIFI-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Handermann updated NIFI-10256:
------------------------------------
    Fix Version/s: 1.18.0

> CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes
> ------------------------------------------------------------------------------------
>
>                 Key: NIFI-10256
>                 URL: https://issues.apache.org/jira/browse/NIFI-10256
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Timea Barna
>            Assignee: Timea Barna
>            Priority: Major
>             Fix For: 1.18.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Given an input CSV file:
> scenario,name
> Honors escape beginning," ""John ""PA""RKINSON"""
> problematic,"""John ""PA""RKINSON"""
> honors escape end,"""John ""PA""RKINSON"
> Based on the RFC 4180 spec:
> https://datatracker.ietf.org/doc/html/rfc4180
> " If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> "
> The output should be like this:
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> However the output is like this"
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> Notice the "problematic" field which initially is """John ""PA""RKINSON""" and based on the RFC spec it should have returned this value "\"John \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the staring and ending double quotes.
> Notice that the other 2 fields expected_remove_end_quote and expected_with_space do work as expected given the RFC spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)