You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Gary D. Gregory (Jira)" <ji...@apache.org> on 2019/10/30 22:23:00 UTC

[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)

    [ https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963478#comment-16963478 ] 

Gary D. Gregory commented on CSV-254:
-------------------------------------

Not quite, see https://issues.apache.org/jira/browse/CSV-253
An empty string is not the same as a {{null}}.

> POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
> ------------------------------------------------------------------
>
>                 Key: CSV-254
>                 URL: https://issues.apache.org/jira/browse/CSV-254
>             Project: Commons CSV
>          Issue Type: Bug
>            Reporter: Lars Bruun-Hansen
>            Priority: Major
>
>  
>  PostgresSQL by default creates CSV files which uses absent value for database NULL values. Consider the following test in PostgresSQL:
> {code:sql}
> CREATE TABLE mytab (
> 	col1   varchar(10),
> 	col2   int,
> 	col3   varchar(10)
> );
> INSERT INTO mytab VALUES ('AA', 33, null);
> INSERT INTO mytab VALUES ('AA', null, '');
> INSERT INTO mytab VALUES (null, 33, 'CC');
> {code}
> and then exporting it to CSV:
> {code:sql}
> \copy mytab TO STDOUT WITH csv
> {code}
> will produce CSV output as follows:
> {noformat}
> AA,33,
> AA,,""
> ,33,CC
> {noformat}
> which the library currently will parse as follows using the current {{POSTGRESQL_CSV}} format:
> {noformat}
> "AA","33",null
> "AA",null,null
> null,"33","CC"
> {noformat}
> Row no 2 is incorrect when comparing to the actual database table contents.
> h2. The fix
> Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to:
> {code:java}
> public static final CSVFormat POSTGRESQL_CSV = DEFAULT
>             .withDelimiter(COMMA)
>             .withEscape(DOUBLE_QUOTE_CHAR)
>             .withIgnoreEmptyLines(false)
>             .withQuote(DOUBLE_QUOTE_CHAR)
>             .withRecordSeparator(LF)
>             .withAbsentMeansNull(true)   // added
>             .withNullString(EMPTY)
>             .withQuoteMode(QuoteMode.ALL_NON_NULL);
> {code}
> (this depends on [PR51|https://github.com/apache/commons-csv/pull/51])
>  With the above the parser will give the following result instead: 
> {noformat}
> "AA","33",null
> "AA",null,""
> null,"33","CC"
> {noformat}
> which is the expected result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)