You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Önder Sezgin (JIRA)" <ji...@apache.org> on 2018/10/05 22:56:00 UTC

[jira] [Assigned] (CAMEL-12698) Unmarshaling a CSV file with the NEL (next line) character will cause Bindy to misread the entire file

     [ https://issues.apache.org/jira/browse/CAMEL-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Önder Sezgin reassigned CAMEL-12698:
------------------------------------

    Assignee: Önder Sezgin

> Unmarshaling a CSV file with the NEL (next line) character will cause Bindy to misread the entire file
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-12698
>                 URL: https://issues.apache.org/jira/browse/CAMEL-12698
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-bindy
>    Affects Versions: 2.22.0
>            Reporter: Jason Black
>            Assignee: Önder Sezgin
>            Priority: Minor
>             Fix For: 2.23.0
>
>
> I am using Apache Camel to process a lot of large CSV files, and relying on Bindy to assist with unmarshalling them into POJOs.
> We have an upstream data bug which causes a record of ours to contain the Unicode character [NEL|http://www.fileformat.info/info/unicode/char/85/index.htm], but while we're working through the cause of that, I found it curious as to what Bindy is actually doing with it.  We rely on the unmarshal process to perform a batch insert, and because our POJO is missing certain fields, we started observing that the 
> Bindy is relying on Scanner to read lines in a large file; however, Scanner itself also does some parsing of the line with the assumption that, if it sees the NEL character, it will regard it as a newline character.  The modern Files API does not make this distinction and reads to a newline designation only (e.g \n, \r, or \r\n).
> There are two ways to fix this from what I've been able to smoke test:
>  * Change the Scanner implementation to use a delimeter of the more traditional newline characters
>  * Use Java 8's Files API and stream the file in
> I would personally want to use the Files API to handle this since it's more robust and capable of higher performance, but I'll explore both approaches and see where I end up.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)