You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Lars Bruun-Hansen (Jira)" <ji...@apache.org> on 2019/11/01 06:18:00 UTC

[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)

    [ https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964633#comment-16964633 ] 

Lars Bruun-Hansen commented on CSV-254:
---------------------------------------

 
{quote}Not quite, see https://issues.apache.org/jira/browse/CSV-253
 An empty string is not the same as a {{null}}.
{quote}
Exactly! And this is essentially what this ticket is about. Not sure I understand the comment. ;)

The {{POSTGRESQL_CSV}} format cannot properly distinguish between empty string and null while parsing. So it gets it wrong. The reason why {{POSTGRESQL_CSV}} format can't is not due as much to the format itself, but to an underlying deficiency in the Parser which is proposed fixed in [PR51|https://github.com/apache/commons-csv/pull/51] / [CSV-253|https://issues.apache.org/jira/browse/CSV-253].

Please suggest how to proceed. I'll be happy to do more work if required or change stuff in PR51or whatever.

Thx.



> POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
> ------------------------------------------------------------------
>
>                 Key: CSV-254
>                 URL: https://issues.apache.org/jira/browse/CSV-254
>             Project: Commons CSV
>          Issue Type: Bug
>            Reporter: Lars Bruun-Hansen
>            Priority: Major
>
>  
>  PostgresSQL by default creates CSV files which uses absent value for database NULL values. Consider the following test in PostgresSQL:
> {code:sql}
> CREATE TABLE mytab (
> 	col1   varchar(10),
> 	col2   int,
> 	col3   varchar(10)
> );
> INSERT INTO mytab VALUES ('AA', 33, null);
> INSERT INTO mytab VALUES ('AA', null, '');
> INSERT INTO mytab VALUES (null, 33, 'CC');
> {code}
> and then exporting it to CSV:
> {code:sql}
> \copy mytab TO STDOUT WITH csv
> {code}
> will produce CSV output as follows:
> {noformat}
> AA,33,
> AA,,""
> ,33,CC
> {noformat}
> which the library currently will parse as follows using the current {{POSTGRESQL_CSV}} format:
> {noformat}
> "AA","33",null
> "AA",null,null
> null,"33","CC"
> {noformat}
> Row no 2 is incorrect when comparing to the actual database table contents.
> h2. The fix
> Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to:
> {code:java}
> public static final CSVFormat POSTGRESQL_CSV = DEFAULT
>             .withDelimiter(COMMA)
>             .withEscape(DOUBLE_QUOTE_CHAR)
>             .withIgnoreEmptyLines(false)
>             .withQuote(DOUBLE_QUOTE_CHAR)
>             .withRecordSeparator(LF)
>             .withAbsentMeansNull(true)   // added
>             .withNullString(EMPTY)
>             .withQuoteMode(QuoteMode.ALL_NON_NULL);
> {code}
> (this depends on [PR51|https://github.com/apache/commons-csv/pull/51])
>  With the above the parser will give the following result instead: 
> {noformat}
> "AA","33",null
> "AA",null,""
> null,"33","CC"
> {noformat}
> which is the expected result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)