You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/01/08 15:42:30 UTC

[GitHub] [nifi] tpalfy opened a new pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

tpalfy opened a new pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966
 
 
   Added "Batch Size property", takes effect only if "Grouping" is set to "None"
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [ ] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically `master`)?
   
   - [ ] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] Have you verified that the full build is successful on both JDK 8 and JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365193100
 
 

 ##########
 File path: nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java
 ##########
 @@ -507,6 +507,20 @@ public ValidationResult validate(final String subject, final String input, final
     // FACTORY METHODS FOR VALIDATORS
     //
     //
+    public static Validator createAllowEmptyValidator(Validator delegate) {
+        return (subject, input, context) -> {
+            ValidationResult validationResult;
+
+            if (input == null || input.equals("")) {
 
 Review comment:
   isEmpty() method of this class could be used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] markap14 commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
markap14 commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365353915
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -255,6 +270,30 @@ public void onPropertyModified(PropertyDescriptor descriptor, String oldValue, S
         req = null;
     }
 
+    @Override
+    protected Collection<ValidationResult> customValidate(ValidationContext validationContext) {
+        final List<ValidationResult> validationResults = new ArrayList<>(super.customValidate(validationContext));
+
+        String destination = validationContext.getProperty(DESTINATION).getValue();
+        String grouping = validationContext.getProperty(GROUPING).getValue();
+        String batchSize = validationContext.getProperty(BATCH_SIZE).getValue();
+
+        if (
+            (!DESTINATION_CONTENT.getValue().equals(destination) || !GROUP_NONE.getValue().equals(grouping))
+                && !isEmpty(batchSize)
 
 Review comment:
   If we avoid allowing Batch Size to be set to an empty string, as recommended above, we can instead just check `validationContext.getProperty(BATCH_SIZE).isSet()` here. I think this reads more cleanly, and is more inline with the way most processors are written.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573761640
 
 
   Merging to master.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] markap14 commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
markap14 commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365353329
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -178,6 +183,15 @@
             .defaultValue(GROUP_ALL.getValue())
             .build();
 
+    public static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
+            .displayName("Batch Size")
+            .name("gethdfsfileinfo-batch-size")
+            .description("Number of records to put into an output flowfile when 'Destination' is set to 'Content'"
+                + " and 'Group Results' is set to 'None'")
+            .required(false)
+            .addValidator(StandardValidators.createAllowEmptyValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR))
 
 Review comment:
   Can you explain the logic here @tpalfy ? I don't understand why we would ever allow an empty string for this property. It is marked as optional (required=false). So either it should be specified as an integer or not specified. It looks like there is quite a bit of code in this PR (creation of this validator, and several places where it checks if the value is empty) that can be avoided.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r364750737
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -178,6 +178,16 @@
             .defaultValue(GROUP_ALL.getValue())
             .build();
 
+    public static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
+            .displayName("Batch Size")
+            .name("gethdfsfileinfo-batch-size")
+            .description("Number of records to put into an output flowfile when 'Destination' is set to 'Content'"
+                + " and 'Group Results' is set to 'None'")
 
 Review comment:
   It might worth adding a validator to check these rules.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365192302
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestGetHDFSFileInfo.java
 ##########
 @@ -76,6 +76,59 @@ public void setup() throws InitializationException {
         runner.setProperty(GetHDFSFileInfo.HADOOP_CONFIGURATION_RESOURCES, "src/test/resources/core-site.xml");
     }
 
+    @Test
+    public void testInvalidBatchSizeWhenDestinationAndGroupingDoesntAllowBatchSize() throws Exception {
+        Arrays.asList("1", "2", "100").forEach(
+            validBatchSize -> {
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_ATTRIBUTES, GetHDFSFileInfo.GROUP_ALL, validBatchSize, false);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_ATTRIBUTES, GetHDFSFileInfo.GROUP_PARENT_DIR, validBatchSize, false);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_ATTRIBUTES, GetHDFSFileInfo.GROUP_NONE, validBatchSize, false);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_CONTENT, GetHDFSFileInfo.GROUP_ALL, validBatchSize, false);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_CONTENT, GetHDFSFileInfo.GROUP_PARENT_DIR, validBatchSize, false);
+            }
+        );
+    }
+
+    @Test
+    public void testValidBatchSize() throws Exception {
+        Arrays.asList("1", "2", "100").forEach(
+            validBatchSize -> {
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_CONTENT, GetHDFSFileInfo.GROUP_NONE, validBatchSize, true);
+            }
+        );
+
+        Arrays.asList("", null).forEach(
+            emptyBatchSize -> {
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_ATTRIBUTES, GetHDFSFileInfo.GROUP_NONE, emptyBatchSize, true);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_ATTRIBUTES, GetHDFSFileInfo.GROUP_PARENT_DIR, emptyBatchSize, true);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_ATTRIBUTES, GetHDFSFileInfo.GROUP_ALL, emptyBatchSize, true);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_CONTENT, GetHDFSFileInfo.GROUP_ALL, emptyBatchSize, true);
+                testValidateBatchSize(GetHDFSFileInfo.DESTINATION_CONTENT, GetHDFSFileInfo.GROUP_PARENT_DIR, emptyBatchSize, true);
 
 Review comment:
   Grouping=NONE + Destination=CONTENT + Batch Size="" / null should also be added as a valid config.
   
   Grouping=NONE + Destination=CONTENT could also be tested with -1 and 0 as invalid configs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365196428
 
 

 ##########
 File path: nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java
 ##########
 @@ -507,6 +507,20 @@ public ValidationResult validate(final String subject, final String input, final
     // FACTORY METHODS FOR VALIDATORS
     //
     //
+    public static Validator createAllowEmptyValidator(Validator delegate) {
 
 Review comment:
   Please add some unit tests for this new validator.
   You can find the existing tests in a different package for some reason: org.apache.nifi.util.validator.TestStandardValidators

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573156420
 
 
   @markap14 As the batch size only used when the Grouping=NONE and Destination=CONTENT, I suggested having no default value. For me it is more consistent, if a property has no value if that value will not be used. It might be a wrong approach and actually I haven't checked how it is handled in similar cases in NiFi.
   Regarding the empty string, I agree with you, I overlooked it in the review.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] tpalfy commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
tpalfy commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573157022
 
 
   > Thanks @tpalfy for this improvement - definitely think this will help for users that want to monitor huge HDFS directories! I think we can simplify the code and the user experience a bit by avoiding the use of an empty string for Batch Size, as I indicated in the inline comments. But please do let me know if I misunderstood something.
   > 
   > Another option that may further simplify things would be to set the default value of the Batch Size property 1. Then, in the custom validate, rather than check if it's set, check if the value is 1 or not. That way, you could actually avoid ever even having to check if the value is null, etc. because the validators guarantee that it's a positive integer and the default value guarantees that it is non-null. Up to you if you want to go this route, but I think it simplifies the code (and tends to adhere to common nifi patterns).
   
   Hi @markap14,
   
   Yes, this was my first approach.
   
   However, @turcsanyip suggested we should help the users notice if they misconfigure the processor by making sure that the otherwise valid _'Batch Size'_ values become invalid unless the prerequisites are met (i.e. _'Destination'_ is set to _'Content'_ and _'Grouping'_ is set to _'None'_).
   With that we needed a valid value when the prerequisites are _not_ met, which would be the empty string.
   
   So basically the user should clear the value if it's not going to be used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] markap14 commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
markap14 commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573431307
 
 
   @tpalfy personally I am ok with either approach - to simple ignore the value when it doesn't make sense to use it (and document exactly when it will be used) or to require that no value be specified when it won't be used. I prefer the latter. And I think that you, @turcsanyip, and I are all in agreement there.
   
   What I am questioning here is rather the notion of allowing an empty string. I am suggesting that we should not allow an empty string to be set. Rather, it must be set to an integer or it must be unset. NiFi does distinguish between an unset value and a value set to an empty string. So my recommendations are as follows:
   - Just use the POSITIVE_INTEGER_VALIDATOR with the optional property. Do not wrap that validator in one that allows an empty string to be set.
   - In customValidate, require that the Batch Size be unset unless the Grouping and Destination are appropriate.
   
   This approach is very consistent with the user experience offered elsewhere. It means that the user cannot set the Batch Size when it does not apply. It means that if it is set, it MUST be a positive integer. It also means that the code is simplified by not requiring the extra "wrapping" validator and that rather than checking `isEmpty` in a lot of places, we can just check if `null` (or `isSet()`) which tends to lend itself to code that is easier to understand.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r364752844
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestGetHDFSFileInfo.java
 ##########
 @@ -563,6 +570,135 @@ public void testRecursiveGroupDirToAttributes() throws Exception {
         Assert.assertEquals(matchCount, 5);
     }
 
+    @Test
+    public void testBatchSizeWithGroupAllBatchSizeNull() throws Exception {
+        testBatchSize(null, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+    @Test
+    public void testBatchSizeWithGroupAllBatchSize1() throws Exception {
+        testBatchSize(1, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+    @Test
+    public void testBatchSizeWithGroupAllBatchSize3() throws Exception {
+        testBatchSize(3, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+    @Test
+    public void testBatchSizeWithGroupAllBatchSize100() throws Exception {
+        testBatchSize(100, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+
+    @Test
+    public void testBatchSizeWithGroupDirBatchSizeNull() throws Exception {
+        testBatchSize(null, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+    @Test
+    public void testBatchSizeWithGroupDirBatchSize1() throws Exception {
+        testBatchSize(1, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+    @Test
+    public void testBatchSizeWithGroupDirBatchSize3() throws Exception {
+        testBatchSize(3, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+    @Test
+    public void testBatchSizeWithGroupDirBatchSize100() throws Exception {
+        testBatchSize(100, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+
+    @Test
+    public void testBatchSizeWithDestAttributeGroupNoneBatchSizeNull() throws Exception {
+        testBatchSize(null, GetHDFSFileInfo.GROUP_NONE, 9);
+    }
+    @Test
+    public void testBatchSizeWithDestAttributeGroupNoneBatchSize1() throws Exception {
+        testBatchSize(1, GetHDFSFileInfo.GROUP_NONE, 9);
+    }
+    @Test
+    public void testBatchSizeWithDestAttributeGroupNoneBatchSize3() throws Exception {
+        testBatchSize(3, GetHDFSFileInfo.GROUP_NONE, 9);
+    }
+    @Test
+    public void testBatchSizeWithDestAttributeGroupNoneBatchSize100() throws Exception {
+        testBatchSize(100, GetHDFSFileInfo.GROUP_NONE, 9);
+    }
+
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSizeNull() throws Exception {
+        testBatchSize(null, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 9);
+
+        checkContentSizes(Arrays.asList(1, 1, 1, 1, 1, 1, 1, 1, 1));
+    }
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSize1() throws Exception {
+        testBatchSize(1, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 9);
+        checkContentSizes(Arrays.asList(1, 1, 1, 1, 1, 1, 1, 1, 1));
+    }
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSize3() throws Exception {
+        testBatchSize(3, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 3);
+        checkContentSizes(Arrays.asList(3, 3, 3));
+    }
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSize4() throws Exception {
+        testBatchSize(4, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 3);
+        checkContentSizes(Arrays.asList(4, 4, 1));
+    }
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSize5() throws Exception {
+        testBatchSize(5, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 2);
+        checkContentSizes(Arrays.asList(5, 4));
+    }
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSize9() throws Exception {
+        testBatchSize(9, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 1);
+        checkContentSizes(Arrays.asList(9));
+    }
+    @Test
+    public void testBatchSizeWithDestContentGroupNoneBatchSize100() throws Exception {
+        testBatchSize(100, GetHDFSFileInfo.GROUP_NONE, GetHDFSFileInfo.DESTINATION_CONTENT, 1);
+        checkContentSizes(Arrays.asList(9));
+    }
+
+    private void testBatchSize(Integer batchSize, AllowableValue grouping, int expectedNrTransferedToSuccess) {
+        testBatchSize(batchSize, grouping, GetHDFSFileInfo.DESTINATION_ATTRIBUTES, expectedNrTransferedToSuccess);
+    }
+
+    private void testBatchSize(Integer batchSize, AllowableValue grouping, AllowableValue destination, int expectedNrTransferedToSuccess) {
 
 Review comment:
   Typo: Transferred

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r364753767
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestGetHDFSFileInfo.java
 ##########
 @@ -563,6 +570,135 @@ public void testRecursiveGroupDirToAttributes() throws Exception {
         Assert.assertEquals(matchCount, 5);
     }
 
+    @Test
+    public void testBatchSizeWithGroupAllBatchSizeNull() throws Exception {
+        testBatchSize(null, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+    @Test
+    public void testBatchSizeWithGroupAllBatchSize1() throws Exception {
+        testBatchSize(1, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+    @Test
+    public void testBatchSizeWithGroupAllBatchSize3() throws Exception {
+        testBatchSize(3, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+    @Test
+    public void testBatchSizeWithGroupAllBatchSize100() throws Exception {
+        testBatchSize(100, GetHDFSFileInfo.GROUP_ALL, 1);
+    }
+
+    @Test
+    public void testBatchSizeWithGroupDirBatchSizeNull() throws Exception {
+        testBatchSize(null, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+    @Test
+    public void testBatchSizeWithGroupDirBatchSize1() throws Exception {
+        testBatchSize(1, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+    @Test
+    public void testBatchSizeWithGroupDirBatchSize3() throws Exception {
+        testBatchSize(3, GetHDFSFileInfo.GROUP_PARENT_DIR, 5);
+    }
+    @Test
+    public void testBatchSizeWithGroupDirBatchSize100() throws Exception {
 
 Review comment:
   Running the tests above with Destination=Content would also cover more cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573659746
 
 
   @tpalfy Thanks for the refactor.
   
   LGTM +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] asfgit closed pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] tpalfy commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
tpalfy commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573588616
 
 
   @markap14 I see, we want to distinguish null and empty String. Fair point, I updated the PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365190713
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -266,6 +268,31 @@ public void onPropertyModified(PropertyDescriptor descriptor, String oldValue, S
         req = null;
     }
 
+    @Override
+    protected Collection<ValidationResult> customValidate(ValidationContext validationContext) {
+        final List<ValidationResult> validationResults = new ArrayList<>(super.customValidate(validationContext));
+
+        String destination = validationContext.getProperty(DESTINATION).getValue();
+        String grouping = validationContext.getProperty(GROUPING).getValue();
+        String batchSize = validationContext.getProperty(BATCH_SIZE).getValue();
+
+        if (
+            (!DESTINATION_CONTENT.getValue().equals(destination) || !GROUP_NONE.getValue().equals(grouping))
+                && batchSize != null
+                && !batchSize.equals("")
 
 Review comment:
   StringUtils.isEmpty() could be used to simplify the condition.
   We have a StringUtils in NiFi and commons-lang3's StringUtils is also used. The latter is already imported here, so I'd use that one (but I don't know if there is a convention for it).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r364752633
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -596,6 +658,23 @@ protected HDFSFileInfoRequest updateRequestDetails(ProcessContext context, Proce
         return req;
     }
 
+    static class ExecutionContext {
+        private int nrOfWaitingHDFSObjects;
+        private FlowFile flowfile;
+
+        public void reset() {
+            nrOfWaitingHDFSObjects = 0;
+            flowfile = null;
+        }
+
+        public void finish(ProcessSession session) {
 
 Review comment:
   Minor: Using the same access modifier (package private is appropriate I think) for the fields and the methods would be more consistent because both the fields and the methods are accessed directly from the outer class.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365190890
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -609,7 +636,11 @@ protected HDFSFileInfoRequest buildRequestDetails(ProcessContext context, Proces
         req.isIgnoreDotDirs = context.getProperty(IGNORE_DOTTED_DIRS).asBoolean();
 
         req.groupping = HDFSFileInfoRequest.Groupping.getEnum(context.getProperty(GROUPING).getValue());
-        req.batchSize = context.getProperty(BATCH_SIZE).asInteger();
+        req.batchSize = Optional.ofNullable(context.getProperty(BATCH_SIZE))
+            .filter(propertyValue -> propertyValue.getValue() != null)
+            .filter(propertyValue -> !propertyValue.getValue().equals(""))
 
 Review comment:
   StringUtils.isEmpty() could be used here too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] markap14 commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
markap14 commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573733306
 
 
   Thanks @tpalfy sorry for the lack of clarity there. Looks good to me now too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#discussion_r365187489
 
 

 ##########
 File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/GetHDFSFileInfo.java
 ##########
 @@ -183,9 +186,8 @@
             .name("gethdfsfileinfo-batch-size")
             .description("Number of records to put into an output flowfile when 'Destination' is set to 'Content'"
                 + " and 'Group Results' is set to 'None'")
-            .required(true)
-            .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
-            .defaultValue("1")
+            .required(false)
+            .addValidator(StandardValidators.createAllowEmptyValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR))
 
 Review comment:
   POSITIVE_INTEGER_VALIDATOR seems to me more reasonable here.
   "0" falls back to batch size = 1, but it might not be straightforward / obvious for the end user.
   The batch size properties of the other processors also have 1 as their lower bound.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] tpalfy commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor

Posted by GitBox <gi...@apache.org>.
tpalfy commented on issue #3966: NIFI-6992 - Add "Batch Size" property to GetHDFSFileInfo processor
URL: https://github.com/apache/nifi/pull/3966#issuecomment-573020796
 
 
   @turcsanyip Thanks for the comments, I applied all the suggested changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services