You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2021/02/22 16:43:17 UTC

[GitHub] [nifi] ottobackwards commented on a change in pull request #4828: NIFI-8232 CSV Parsers optionally allow/reject duplicate header names

ottobackwards commented on a change in pull request #4828:
URL: https://github.com/apache/nifi/pull/4828#discussion_r580400233



##########
File path: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVHeaderSchemaStrategy.java
##########
@@ -66,9 +66,37 @@ public void testSimple() throws SchemaNotFoundException, IOException {
             .allMatch(field -> field.getDataType().equals(RecordFieldType.STRING.getDataType())));
     }
 

Review comment:
       which is this testing, Jackson or Apache?  

##########
File path: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVRecordReader.java
##########
@@ -592,6 +585,47 @@ public void testExtraFieldNotInHeader() throws IOException, MalformedRecordExcep
         }
     }
 

Review comment:
       You may want to test with multiple duplicates,  id, name, country, id, name, country.
   Also, I assume case doesn't matter, but that is an assumption.
   id, name, country, ID, NAME, COUNTRY

##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/csv/CSVUtils.java
##########
@@ -136,6 +136,15 @@
         .defaultValue("UTF-8")
         .required(true)
         .build();
+    public static final PropertyDescriptor ALLOW_DUPLICATE_HEADER_NAMES = new PropertyDescriptor.Builder()
+        .name("csvutils-allow-duplicate-header-names")
+        .displayName("Allow Duplicate Header Names")

Review comment:
       Maybe this should say what happens if there *are* duplicate headers found too

##########
File path: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/JacksonCSVRecordReader.java
##########
@@ -108,6 +112,17 @@ public Record nextRecord(final boolean coerceTypes, final boolean dropUnknownFie
                     rawFieldNames = schema.getFieldNames();
                 } else {
                     rawFieldNames = Arrays.asList(csvRecord);
+                    if (rawFieldNames.size() > schema.getFieldCount() && !allowDuplicateHeaderNames) {
+                        final Set<String> deDupe = new HashSet<>(schema.getFieldCount());

Review comment:
       So, if I have multiple duplicate names, then I'm going to have to iterate through these errors, one by one, field by field.
   Have you given thought to tracking the duplicates, and then throwing if there are any, and include all the duplicate fields in that exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org