You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/08/03 14:46:44 UTC

[GitHub] [nifi] pgyori commented on a diff in pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

pgyori commented on code in PR #6234:
URL: https://github.com/apache/nifi/pull/6234#discussion_r936709751


##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java:
##########
@@ -79,6 +81,17 @@ public class CSVReader extends SchemaRegistryService implements RecordReaderFact
             .required(true)
             .build();
 
+    public static final PropertyDescriptor TRIM_DOUBLE_QUOTE = new PropertyDescriptor.Builder()
+            .name("Trim double quote")
+            .description("Whether or not to trim starting and ending double quotes. For example: with trim string '\"test\"'"

Review Comment:
   I think it would be useful to add to the description something like
   "if set to 'false' it means full compliance with RFC-4180".
   This is because a new user who hasn't used this before selects RFC-4180 as the CSV format, they would probably only want to know what value of this property guarantees that.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestJacksonCSVRecordReader.java:
##########
@@ -57,9 +57,9 @@ private List<RecordField> getDefaultFields() {
         return fields;
     }
 
-    private JacksonCSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format) throws IOException {
+    private JacksonCSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format, final boolean trimDoubleQuote) throws IOException {

Review Comment:
   Just like in the case of the constructor of CSVRecordReader, I recommend overloading this method with a variant that does not contain the `trimDoubleQuote` parameter, and it should call this method with `trimDoubleQuote=true`.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordReader.java:
##########
@@ -44,11 +44,12 @@
 
 public class CSVRecordReader extends AbstractCSVRecordReader {
     private final CSVParser csvParser;
+    private final boolean trimDoubleQuote;
 
     private List<RecordField> recordFields;
 
     public CSVRecordReader(final InputStream in, final ComponentLog logger, final RecordSchema schema, final CSVFormat csvFormat, final boolean hasHeader, final boolean ignoreHeader,

Review Comment:
   I recommend reintroducing the declaration of the original constructor, and the implementation should just call this constructor with `trimDoubleQuote=true`. This way everywhere where you called this constructor with `trimDoubleQuote=true`, you could revert to calling the original constructor.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/JacksonCSVRecordReader.java:
##########
@@ -52,11 +52,12 @@ public class JacksonCSVRecordReader extends AbstractCSVRecordReader {
     private final MappingIterator<String[]> recordStream;
     private List<String> rawFieldNames = null;
     private boolean allowDuplicateHeaderNames;
+    private final boolean trimDoubleQuote;
 
     private volatile static CsvMapper mapper = new CsvMapper().enable(CsvParser.Feature.WRAP_AS_ARRAY);
 
     public JacksonCSVRecordReader(final InputStream in, final ComponentLog logger, final RecordSchema schema, final CSVFormat csvFormat, final boolean hasHeader, final boolean ignoreHeader,

Review Comment:
   Same as in the case of CSVRecordReader: I recommend reintroducing the declaration of the original constructor, and the implementation should just call this constructor with trimDoubleQuote=true. This way everywhere where you called this constructor with trimDoubleQuote=true, you could revert to calling the original constructor.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java:
##########
@@ -50,6 +50,8 @@
 import java.util.List;
 import java.util.Map;
 
+import static org.apache.nifi.csv.CSVUtils.CSV_FORMAT;

Review Comment:
   I believe this is an accidental auto-import. Could you please remove it?



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVRecordReader.java:
##########
@@ -63,9 +63,9 @@ private List<RecordField> getDefaultFields() {
         return fields;
     }
 
-    private CSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format) throws IOException {
+    private CSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format, final boolean trimDoubleQuote) throws IOException {

Review Comment:
   Just like in the case of the constructor of CSVRecordReader, I recommend overloading this method with a variant that does not contain the `trimDoubleQuote` parameter, and it should call this method with `trimDoubleQuote=true`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org