You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/08/08 15:15:00 UTC

[jira] [Commented] (NIFI-10256) CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes

    [ https://issues.apache.org/jira/browse/NIFI-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576848#comment-17576848 ] 

ASF subversion and git services commented on NIFI-10256:
--------------------------------------------------------

Commit 26829e5c350766e770ccb3a7f8d3149dd5924409 in nifi's branch refs/heads/main from Timea Barna
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=26829e5c35 ]

NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes

NIFI-10256 Addresseng review comments, adding extra description, removing unneccessary static import, creating extra constructor

NIFI-10256 Refactoring CSVRecordReader

NIFI-10256 Addresseng review comments, adding validator

Signed-off-by: Bence Simon <bs...@apache.org>
This closes #6234


> CSVRecordReader using RFC 4180 CSV format trimming starting and ending double quotes
> ------------------------------------------------------------------------------------
>
>                 Key: NIFI-10256
>                 URL: https://issues.apache.org/jira/browse/NIFI-10256
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Timea Barna
>            Assignee: Timea Barna
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Given an input CSV file:
> scenario,name
> Honors escape beginning," ""John ""PA""RKINSON"""
> problematic,"""John ""PA""RKINSON"""
> honors escape end,"""John ""PA""RKINSON"
> Based on the RFC 4180 spec:
> https://datatracker.ietf.org/doc/html/rfc4180
> " If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> "
> The output should be like this:
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> However the output is like this"
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> Notice the "problematic" field which initially is """John ""PA""RKINSON""" and based on the RFC spec it should have returned this value "\"John \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the staring and ending double quotes.
> Notice that the other 2 fields expected_remove_end_quote and expected_with_space do work as expected given the RFC spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)