You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Anatoliy Artemenko (Jira)" <ji...@apache.org> on 2021/09/30 20:22:00 UTC
[jira] [Commented] (CSV-290) Produced CSV using PostgreSQL format
cannot be read
[ https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423003#comment-17423003 ]
Anatoliy Artemenko commented on CSV-290:
----------------------------------------
In the Lexer class, method parseEncapsulatedToken, I did modification as following
{code:java}
while (true) {
c = reader.read();
// if (isEscape(c)) { // commented out this line
if (isEscape(c) && !isQuoteChar(c)) { // inserted this line.
{code}
If I understood the logic correctly, that method is invoked only for those cases when a "cell" value starts with the QUOTE character. This means that the ESCAPE character won't be encountered for escaping the DELIMITER character.
{code:java}
@Test
public void testQuotedString() throws IOException {
/*
* file: "a","b""b","c"
*/
final String code = "\"a\",\"b\"\"b\",\"c\"";
final CSVFormat format = CSVFormat.POSTGRESQL_CSV.withIgnoreEmptyLines(false);
try (final Lexer parser = createLexer(code, format)) {
assertThat(parser.nextToken(new Token()), matches(TOKEN, "a"));
assertThat(parser.nextToken(new Token()), matches(TOKEN, "b\"b"));
assertThat(parser.nextToken(new Token()), matches(EOF, "c"));
}
}
{code}
> Produced CSV using PostgreSQL format cannot be read
> ---------------------------------------------------
>
> Key: CSV-290
> URL: https://issues.apache.org/jira/browse/CSV-290
> Project: Commons CSV
> Issue Type: Bug
> Components: Parser
> Affects Versions: 1.6, 1.9.0
> Reporter: Anatoliy Artemenko
> Priority: Major
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>
> CSVPrinter printer = new CSVPrinter(sw, CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>
> cannot be be read with same format parser:
>
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>
> To reproduce:
>
> {code:java}
> StringWriter sw = new StringWriter();
> CSVPrinter printer = new CSVPrinter(sw, CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
> printer.printRecord("column1", "column2");
> printer.printRecord("v11", "v12");
> printer.printRecord("v21", "v22");
> printer.close();
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
> System.out.println("headers: " + Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", "column2"}));
> Iterator<CSVRecord> i = parser.iterator();
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new String[] {"v11", "v12"}));
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token finishedjava.io.IOException: (startline 1) EOF reached before encapsulated token finished
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371)
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285)
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701)
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480)
> at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:432)
> at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:398)
> at Test.main(Test.java:25)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)