You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Violette (JIRA)" <ji...@apache.org> on 2014/02/18 23:41:19 UTC

[jira] [Commented] (PHOENIX-53) CSV loader fails on empty line

    [ https://issues.apache.org/jira/browse/PHOENIX-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904732#comment-13904732 ] 

James Violette commented on PHOENIX-53:
---------------------------------------

We found that the apache commons-csv library (http://commons.apache.org/proper/commons-csv/) has the features we need to handle different csv formats. Even though it is not yet released, this library is being actively developed (last commit 1/2014) and it works well in SNAPSHOT mode. By contrast, the opencsv project (http://sourceforge.net/projects/opencsv/files/opencsv/) has been dormant since 2011, long enough to force a fork (http://code.google.com/p/opencsv/).

We found that the opencsv loader accepted bad encapsulated meta-character records and ended up getting confused, which resulted in a 50% load success rate and significant data corruption.  By comparison, the apache commons-csv parser threw an exception when the csv format was not followed. That exception allowed us to isolate the issue and also prevent subsequent corrupt data records.

We used the commons-csv source from this repo:
http://svn.apache.org/repos/asf/commons/proper/csv/trunk/

We have created a CSVCommonsLoader that uses this parser. The attached patch replaces the current CSVLoader in PhoenixRuntime. Someone can update the command line parameters to swap out the parsers, if required. The associated CSVCommonsLoaderTests verify regression tests with the supplied test data plus two new tests with the encapsulated characters.



> CSV loader fails on empty line
> ------------------------------
>
>                 Key: PHOENIX-53
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-53
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 2.2.3
>            Reporter: James Violette
>
> in org.apache.phoenix.util.CSVLoader, the upsert fails if it encounters an empty line.  This occurs if all lines end with the new line character and the reader returns an empty line at the end.
> A fix is to add a guard while reading the next line.
> 	public void upsert(CSVReader reader) throws Exception {
>    ...
>     		while ((nextLine = reader.readNext()) != null) {
>     			if (nextLine.length==0) {
>     				continue;
>     			}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)