You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/07/22 06:20:08 UTC

[GitHub] [nifi] timeabarna opened a new pull request, #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

timeabarna opened a new pull request, #6234:
URL: https://github.com/apache/nifi/pull/6234

   …g and ending double quotes
   
   # Summary
   
   [NIFI-10256](https://issues.apache.org/jira/browse/NIFI-10256)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
     - [ ] JDK 8
     - [ ] JDK 11
     - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] timeabarna commented on pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

Posted by GitBox <gi...@apache.org>.
timeabarna commented on PR #6234:
URL: https://github.com/apache/nifi/pull/6234#issuecomment-1204823360

   Thanks @pgyori for your review, I've modified the code based on your recommendations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] asfgit closed pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…
URL: https://github.com/apache/nifi/pull/6234


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] timeabarna commented on a diff in pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

Posted by GitBox <gi...@apache.org>.
timeabarna commented on code in PR #6234:
URL: https://github.com/apache/nifi/pull/6234#discussion_r939836625


##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java:
##########
@@ -79,6 +79,18 @@ public class CSVReader extends SchemaRegistryService implements RecordReaderFact
             .required(true)
             .build();
 
+    public static final PropertyDescriptor TRIM_DOUBLE_QUOTE = new PropertyDescriptor.Builder()
+            .name("Trim double quote")
+            .description("Whether or not to trim starting and ending double quotes. For example: with trim string '\"test\"'"
+                    +" would be parsed to 'test', without trim would be parsed to '\"test\"'."
+                    + "If set to 'false' it means full compliance with RFC-4180. Default value is true, with trim.")
+            .expressionLanguageSupported(ExpressionLanguageScope.NONE)
+            .allowableValues("true", "false")
+            .defaultValue("true")
+            .dependsOn(CSVUtils.CSV_FORMAT, CSVUtils.RFC_4180)

Review Comment:
   Thanks @simonbence, validator added, please merge it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] pgyori commented on a diff in pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

Posted by GitBox <gi...@apache.org>.
pgyori commented on code in PR #6234:
URL: https://github.com/apache/nifi/pull/6234#discussion_r936709751


##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java:
##########
@@ -79,6 +81,17 @@ public class CSVReader extends SchemaRegistryService implements RecordReaderFact
             .required(true)
             .build();
 
+    public static final PropertyDescriptor TRIM_DOUBLE_QUOTE = new PropertyDescriptor.Builder()
+            .name("Trim double quote")
+            .description("Whether or not to trim starting and ending double quotes. For example: with trim string '\"test\"'"

Review Comment:
   I think it would be useful to add to the description something like
   "if set to 'false' it means full compliance with RFC-4180".
   This is because a new user who hasn't used this before selects RFC-4180 as the CSV format, they would probably only want to know what value of this property guarantees that.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestJacksonCSVRecordReader.java:
##########
@@ -57,9 +57,9 @@ private List<RecordField> getDefaultFields() {
         return fields;
     }
 
-    private JacksonCSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format) throws IOException {
+    private JacksonCSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format, final boolean trimDoubleQuote) throws IOException {

Review Comment:
   Just like in the case of the constructor of CSVRecordReader, I recommend overloading this method with a variant that does not contain the `trimDoubleQuote` parameter, and it should call this method with `trimDoubleQuote=true`.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVRecordReader.java:
##########
@@ -44,11 +44,12 @@
 
 public class CSVRecordReader extends AbstractCSVRecordReader {
     private final CSVParser csvParser;
+    private final boolean trimDoubleQuote;
 
     private List<RecordField> recordFields;
 
     public CSVRecordReader(final InputStream in, final ComponentLog logger, final RecordSchema schema, final CSVFormat csvFormat, final boolean hasHeader, final boolean ignoreHeader,

Review Comment:
   I recommend reintroducing the declaration of the original constructor, and the implementation should just call this constructor with `trimDoubleQuote=true`. This way everywhere where you called this constructor with `trimDoubleQuote=true`, you could revert to calling the original constructor.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/JacksonCSVRecordReader.java:
##########
@@ -52,11 +52,12 @@ public class JacksonCSVRecordReader extends AbstractCSVRecordReader {
     private final MappingIterator<String[]> recordStream;
     private List<String> rawFieldNames = null;
     private boolean allowDuplicateHeaderNames;
+    private final boolean trimDoubleQuote;
 
     private volatile static CsvMapper mapper = new CsvMapper().enable(CsvParser.Feature.WRAP_AS_ARRAY);
 
     public JacksonCSVRecordReader(final InputStream in, final ComponentLog logger, final RecordSchema schema, final CSVFormat csvFormat, final boolean hasHeader, final boolean ignoreHeader,

Review Comment:
   Same as in the case of CSVRecordReader: I recommend reintroducing the declaration of the original constructor, and the implementation should just call this constructor with trimDoubleQuote=true. This way everywhere where you called this constructor with trimDoubleQuote=true, you could revert to calling the original constructor.



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java:
##########
@@ -50,6 +50,8 @@
 import java.util.List;
 import java.util.Map;
 
+import static org.apache.nifi.csv.CSVUtils.CSV_FORMAT;

Review Comment:
   I believe this is an accidental auto-import. Could you please remove it?



##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVRecordReader.java:
##########
@@ -63,9 +63,9 @@ private List<RecordField> getDefaultFields() {
         return fields;
     }
 
-    private CSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format) throws IOException {
+    private CSVRecordReader createReader(final InputStream in, final RecordSchema schema, CSVFormat format, final boolean trimDoubleQuote) throws IOException {

Review Comment:
   Just like in the case of the constructor of CSVRecordReader, I recommend overloading this method with a variant that does not contain the `trimDoubleQuote` parameter, and it should call this method with `trimDoubleQuote=true`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] simonbence commented on a diff in pull request #6234: NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming startin…

Posted by GitBox <gi...@apache.org>.
simonbence commented on code in PR #6234:
URL: https://github.com/apache/nifi/pull/6234#discussion_r938756002


##########
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java:
##########
@@ -79,6 +79,18 @@ public class CSVReader extends SchemaRegistryService implements RecordReaderFact
             .required(true)
             .build();
 
+    public static final PropertyDescriptor TRIM_DOUBLE_QUOTE = new PropertyDescriptor.Builder()
+            .name("Trim double quote")
+            .description("Whether or not to trim starting and ending double quotes. For example: with trim string '\"test\"'"
+                    +" would be parsed to 'test', without trim would be parsed to '\"test\"'."
+                    + "If set to 'false' it means full compliance with RFC-4180. Default value is true, with trim.")
+            .expressionLanguageSupported(ExpressionLanguageScope.NONE)
+            .allowableValues("true", "false")
+            .defaultValue("true")
+            .dependsOn(CSVUtils.CSV_FORMAT, CSVUtils.RFC_4180)

Review Comment:
   Please add validator and displya name, something like:
   ```
   .name("trim-double-quotes")
   .displayName("Trim double quotes")
   .addValidator(StandardValidators.BOOLEAN_VALIDATOR)
   ```
   
   Otherwise it looks good and after this change I will merge it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org