You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Anatoliy Artemenko (Jira)" <ji...@apache.org> on 2021/09/30 20:22:00 UTC

[jira] [Commented] (CSV-290) Produced CSV using PostgreSQL format cannot be read

    [ https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423003#comment-17423003 ] 

Anatoliy Artemenko commented on CSV-290:
----------------------------------------

In the Lexer class, method parseEncapsulatedToken, I did modification as following
{code:java}
while (true) {
  c = reader.read();            
  
  // if (isEscape(c)) { // commented out this line
  if (isEscape(c) && !isQuoteChar(c)) { // inserted this line. 
{code}
If I understood the logic correctly, that method is invoked only for those cases when a "cell" value starts with the QUOTE character. This means that the ESCAPE character won't be encountered for escaping the DELIMITER character.
{code:java}
@Test
public void testQuotedString() throws IOException {
    /*
     * file: "a","b""b","c"
     */
    final String code = "\"a\",\"b\"\"b\",\"c\"";
    final CSVFormat format = CSVFormat.POSTGRESQL_CSV.withIgnoreEmptyLines(false);
    
    try (final Lexer parser = createLexer(code, format)) {
        assertThat(parser.nextToken(new Token()), matches(TOKEN, "a"));
        assertThat(parser.nextToken(new Token()), matches(TOKEN, "b\"b"));
        assertThat(parser.nextToken(new Token()), matches(EOF, "c"));
    }
}
{code}

> Produced CSV using PostgreSQL format cannot be read
> ---------------------------------------------------
>
>                 Key: CSV-290
>                 URL: https://issues.apache.org/jira/browse/CSV-290
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.6, 1.9.0
>            Reporter: Anatoliy Artemenko
>            Priority: Major
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>  
> CSVPrinter printer = new CSVPrinter(sw, CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> cannot be be read with same format parser:
>  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> To reproduce: 
>  
> {code:java}
> StringWriter sw = new StringWriter(); 
> CSVPrinter printer = new CSVPrinter(sw, CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> printer.printRecord("column1", "column2"); 
> printer.printRecord("v11", "v12"); 
> printer.printRecord("v21", "v22");  
> printer.close();  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> System.out.println("headers: " + Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", "column2"}));  
> Iterator<CSVRecord> i = parser.iterator(); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new String[] {"v11", "v12"})); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token finishedjava.io.IOException: (startline 1) EOF reached before encapsulated token finished 
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371) 
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285) 
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701) 
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480) 
> at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:432) 
> at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:398) 
> at Test.main(Test.java:25)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)