You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Vadim (JIRA)" <ji...@apache.org> on 2018/08/16 14:17:00 UTC

[jira] [Created] (NIFI-5525) CSVRecordReader fails with StringIndexOutOfBoundsException when field is a double quote

Vadim created NIFI-5525:
---------------------------

             Summary: CSVRecordReader fails with StringIndexOutOfBoundsException when field is a double quote
                 Key: NIFI-5525
                 URL: https://issues.apache.org/jira/browse/NIFI-5525
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.7.1
            Reporter: Vadim


*Bug description:*

When trying to parse a CSV file given in RFC4180 format and one of its fields is a double quote, CSVRecordReader fails with the following exception:
{quote}java.lang.StringIndexOutOfBoundsException: String index out of range: -1

at java.lang.String.substring(String.java:1967)
at org.apache.nifi.csv.AbstractCSVRecordReader.convert(AbstractCSVRecordReader.java:82)
at org.apache.nifi.csv.CSVRecordReader.nextRecord(CSVRecordReader.java:102)
at org.apache.nifi.serialization.RecordReader.nextRecord(RecordReader.java:50)
at org.apache.nifi.csv.TestCSVRecordReader.testQuote(TestCSVRecordReader.java:610)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
{quote}
 

Note, that according to RFC4180:

 
If double-quotes are used to enclose fields, then a double-quote
       appearing inside a field must be escaped by preceding it with
       another double quote.
[https://tools.ietf.org/html/rfc4180#page-2]

 

Then a field whose value is a double quote character would be encoded like this:

""""

(4 double quote characters)  

*How to reproduce*

Add the following method to TestCSVRecordReader.java and run the test:

 
{code:java}
@Test
public void testQuote() throws IOException, MalformedRecordException {
final CSVFormat format = CSVFormat.RFC4180.withFirstRecordAsHeader().withTrim().withQuote('"');
final String text = "\"name\"\n\"\"\"\"";

final List<RecordField> fields = new ArrayList<>();
fields.add(new RecordField("name", RecordFieldType.STRING.getDataType()));
final RecordSchema schema = new SimpleRecordSchema(fields);

try (final InputStream bais = new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8));
final CSVRecordReader reader = new CSVRecordReader(bais, Mockito.mock(ComponentLog.class), schema, format, true, false,
RecordFieldType.DATE.getDefaultFormat(), RecordFieldType.TIME.getDefaultFormat(), RecordFieldType.TIMESTAMP.getDefaultFormat(), StandardCharsets.UTF_8.name())) {

final Record record = reader.nextRecord();
final String name = (String)record.getValue("name");

assertEquals("\"", name);
}
}

{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)