You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by vadimar <gi...@git.apache.org> on 2018/11/05 12:29:51 UTC

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

GitHub user vadimar opened a pull request:

    https://github.com/apache/nifi/pull/3128

    NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vadimar/nifi-1 nifi-5788

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/3128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3128
    
----
commit 2f36c8b1a732e249238f5f6f53968e84c05b497c
Author: vadimar <va...@...>
Date:   2018-11-05T11:15:12Z

    NIFI-5788: Introduce batch size limit in PutDatabaseRecord processor

----


---

[GitHub] nifi issue #3128: NIFI-5788: Introduce batch size limit in PutDatabaseRecord...

Posted by vadimar <gi...@git.apache.org>.
Github user vadimar commented on the issue:

    https://github.com/apache/nifi/pull/3128
  
    Hi,
    Can you please review the latest commits? I committed the changes that address all the issues raised by reviewers.
    Thanks 


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r230812123
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -265,6 +265,17 @@
                 .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
                 .build();
     
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Bulk Size")
    +            .description("Specifies batch size for INSERT and UPDATE statements. This parameter has no effect for other statements specified in 'Statement Type'."
    +                    + " Non-positive value has the effect of infinite bulk size.")
    +            .defaultValue("-1")
    --- End diff --
    
    What does a value of zero do? Would anyone ever use it? If not, perhaps zero is the best default to indicate infinite bulk size. If you do change it to zero, please change the validator to a NONNEGATIVE_INTEGER_VALIDATOR to match


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by patricker <gi...@git.apache.org>.
Github user patricker commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r230917511
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile
                             }
                         }
                         ps.addBatch();
    +                    if (++currentBatchSize == batchSize) {
    --- End diff --
    
    Would it be beneficial to capture `currentBatchSize*batchIndex`, with `batchIndex` being incremented only after a successful call to `executeBatch()` as an attribute? My thinking is, if you have a failure, and only part of a batch was loaded, you could store how many rows were loaded successfully as an attribute?


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by vadimar <gi...@git.apache.org>.
Github user vadimar commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r231088684
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile
                             }
                         }
                         ps.addBatch();
    +                    if (++currentBatchSize == batchSize) {
    --- End diff --
    
    I'm not sure this would be benefitial. PutDatabaseRecord works without autoCommit. It's all or nothing.


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by vadimar <gi...@git.apache.org>.
Github user vadimar commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r231086664
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -265,6 +265,17 @@
                 .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
                 .build();
     
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    --- End diff --
    
    Agree regarding "Maximum Batch Size". Sounds better. What's "bulk size"? Is it relevant to this change?


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r230811717
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -265,6 +265,17 @@
                 .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
                 .build();
     
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    --- End diff --
    
    We should be consistent here with "batch size" and "bulk size" in the naming of variables, documentation, etc. Maybe "Maximum Batch Size"?


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by patricker <gi...@git.apache.org>.
Github user patricker commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r230916140
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -265,6 +265,17 @@
                 .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
                 .build();
     
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Bulk Size")
    +            .description("Specifies batch size for INSERT and UPDATE statements. This parameter has no effect for other statements specified in 'Statement Type'."
    +                    + " Non-positive value has the effect of infinite bulk size.")
    +            .defaultValue("-1")
    --- End diff --
    
    I agree that `0` should be the default, and would replicate the current behavior of the processor, "All records in one batch".


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by vadimar <gi...@git.apache.org>.
Github user vadimar commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r231089816
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -265,6 +265,17 @@
                 .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
                 .build();
     
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    --- End diff --
    
    Oh. I see it now. The display label is "Bulk Size". I'll fix it to be "Maximum Batch Size". Thanks


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by patricker <gi...@git.apache.org>.
Github user patricker commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r231153599
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -669,11 +685,20 @@ private void executeDML(ProcessContext context, ProcessSession session, FlowFile
                             }
                         }
                         ps.addBatch();
    +                    if (++currentBatchSize == batchSize) {
    --- End diff --
    
    True, I missed that override before, but I see it now. So definitely less valuable, the only thing it would provide would be troubleshooting guidance, "your bad data is roughly in this part of the file". Probably not worth it. Thanks!


---

[GitHub] nifi issue #3128: NIFI-5788: Introduce batch size limit in PutDatabaseRecord...

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/3128
  
    +1 LGTM, tested with various batch sizes and ran unit tests. Thanks for this improvment! Merged to master


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/3128


---

[GitHub] nifi pull request #3128: NIFI-5788: Introduce batch size limit in PutDatabas...

Posted by vadimar <gi...@git.apache.org>.
Github user vadimar commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3128#discussion_r231087439
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -265,6 +265,17 @@
                 .expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
                 .build();
     
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Bulk Size")
    +            .description("Specifies batch size for INSERT and UPDATE statements. This parameter has no effect for other statements specified in 'Statement Type'."
    +                    + " Non-positive value has the effect of infinite bulk size.")
    +            .defaultValue("-1")
    --- End diff --
    
    I'll change the default to be zero and the validator to NONNEGATIVE_INTEGER_VALIDATOR


---