You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Pierre Villard (Jira)" <ji...@apache.org> on 2019/12/27 17:27:00 UTC

[jira] [Commented] (NIFI-6967) Choosing Jackson Parser for CSVReader Doesn't Actually Choose It

    [ https://issues.apache.org/jira/browse/NIFI-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004281#comment-17004281 ] 

Pierre Villard commented on NIFI-6967:
--------------------------------------

As far as I can see in the code the Apache Commons CSV Parser will always be used to infer the schema from the CSV, but then the correct record reader parser will be selected based on the controller service configuration. Thing is... we are parsing the records to infer the types of the fields. In your situation I'd change the Schema Access Strategy from "Infer Schema" to "Use String Fields From Header".

The next hing to know (and documentation would probably need to be improved) is that you need to configure CSV Format as "Custom" to actually tell the processor to use the properties about separator, quote character, escape character, etc. Because the tab-delimited format will take the default ones.

By changing the configuration as described and by defining quote character and escape characters with characters you're sure to never see in your data (like weird symbols), I got your example working.

I don't know if we should change the way we infer the schema - I don't have a strong opinion about this.

I hope the above explanations provide some help for your use case.

> Choosing Jackson Parser for CSVReader Doesn't Actually Choose It
> ----------------------------------------------------------------
>
>                 Key: NIFI-6967
>                 URL: https://issues.apache.org/jira/browse/NIFI-6967
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Shawn Weeks
>            Priority: Minor
>         Attachments: Jackson_Bug.xml, nifi_jackson_log.txt
>
>
> While looking at NIFI-6966 I discovered that choosing Jackson CSV as the CSV Parser in CSVReader doesn't actually use Jackson's parser. No idea why. I've attached an example with the log I see.
> NiFi Version Information
> 1.10.0
> 10/29/2019 09:56:52 CDT
> Tagged nifi-1.10.0-RC3 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)