You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by cammachusa <gi...@git.apache.org> on 2017/09/18 17:47:59 UTC

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

GitHub user cammachusa opened a pull request:

    https://github.com/apache/nifi/pull/2160

    [NiFi-4384] - Enhance PutKudu processor to support batch insert

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/InspurUSA/nifi NiFi-4384

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2160.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2160
    
----
commit 2ddbeda258421d39b7bd6f1b48cf17ded18743f6
Author: cam <ca...@inspur.com>
Date:   2017-09-14T22:29:08Z

    [NiFi-4384] - Enhance PutKudu processor to support batch insert

----


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140314045
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    --- End diff --
    
    @pvillard31 , different methods, archiving the same purpose :-) I don't have any opinion. It was they way Ricky suggested in its initial PR #2020 . I would leave it like that since it looks straightforward :-)


---

[GitHub] nifi issue #2160: [NiFi-4384] - Enhance PutKudu processor to support batch i...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/2160
  
    I fixed a copy/paste error:
    
    ````java
    batchSize = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().asInteger();
    ````
    
    (TABLE_NAME instead of BATCH_SIZE)
    
    And also fixed some typos. LGTM, merging to master. Thanks @cammachusa 


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140314637
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be buffered")
    +            .defaultValue("100")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    --- End diff --
    
    Excellent!


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140340082
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -124,11 +148,14 @@ public void OnScheduled(final ProcessContext context) {
                     kuduTable = this.getKuduTable(kuduClient, tableName);
                     getLogger().debug("Kudu connection successfully initialized");
                 }
    +
    --- End diff --
    
    And, feel free to let me know what else should be adjusted


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140362618
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +97,29 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be buffered, between 2 - 100000. " +
    +                    "Depend on your memory size, and data size per row set an appropriate batch size. " +
    +                    "Gradually increase this number to find out your best one for best performance")
    +            .defaultValue("100")
    +            .required(true)
    +            .addValidator(StandardValidators.createLongValidator(2, 100000, true))
    --- End diff --
    
    The value of 1 wouldn't make sense. If set 1, the buffer will always have one item, since its purpose is to queue up coming items. Second, doing so, will significantly degrade the performance. The read always faster than the write.


---

[GitHub] nifi issue #2160: [NiFi-4384] - Enhance PutKudu processor to support batch i...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on the issue:

    https://github.com/apache/nifi/pull/2160
  
    Hi @joewitt and @rickysaltzer , would you like to review this PR about PutKudu again? :-) It's simply added a couple more configurations to help users speed up the ingestion process. Thanks, 


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140263816
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be buffered")
    +            .defaultValue("100")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    --- End diff --
    
    I'd suggest an integer validator with a range. And also add in the description what would be the behaviour if a user sets the value 0.


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140265302
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -124,11 +148,14 @@ public void OnScheduled(final ProcessContext context) {
                     kuduTable = this.getKuduTable(kuduClient, tableName);
                     getLogger().debug("Kudu connection successfully initialized");
                 }
    +
    --- End diff --
    
    I know this is not part of this PR, but would not it make sense to allow expression language on table name and Kudu masters? (not with an evaluation against flow files but just against the variable registry and such in case someone wants to externalize the values between environments). Same remark for batch size?


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140355391
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    --- End diff --
    
    My only concern is: if we update the Kudu dependency and a new SessionConfiguration.FlushMode option is made available in the library, the contributor bumping the Kudu version will have to remember to update the description otherwise there will be an undocumented option. But I'm fine with this approach: it does not make a huge difference.


---

[GitHub] nifi issue #2160: [NiFi-4384] - Enhance PutKudu processor to support batch i...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/2160
  
    Another comment completely unrelated to this specific PR: could you add the tag ``"record"`` in PutKudu ``@Tags`` annotation to be in line with other record-oriented processors? It'll help users to list the record processors.


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140320346
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -124,11 +148,14 @@ public void OnScheduled(final ProcessContext context) {
                     kuduTable = this.getKuduTable(kuduClient, tableName);
                     getLogger().debug("Kudu connection successfully initialized");
                 }
    +
    --- End diff --
    
    @pvillard31 , I'm not quite sure to understand it. What would you suggest me to change?


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/2160


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140267236
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    --- End diff --
    
    I don't think the validator is required when you allow a list of values and define a default one.


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140353856
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +97,29 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be buffered, between 2 - 100000. " +
    +                    "Depend on your memory size, and data size per row set an appropriate batch size. " +
    +                    "Gradually increase this number to find out your best one for best performance")
    +            .defaultValue("100")
    --- End diff --
    
    Looking at the AsyncKuduSession class, it seems the default value is 1000. Any reason to set it to 100?


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140314337
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    --- End diff --
    
    Agree!


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140353231
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +97,29 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be buffered, between 2 - 100000. " +
    +                    "Depend on your memory size, and data size per row set an appropriate batch size. " +
    +                    "Gradually increase this number to find out your best one for best performance")
    +            .defaultValue("100")
    +            .required(true)
    +            .addValidator(StandardValidators.createLongValidator(2, 100000, true))
    --- End diff --
    
    I'm not an expert, but any reason for setting the minimum value to 2? Does a value of 1 make sense?


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140331750
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -124,11 +148,14 @@ public void OnScheduled(final ProcessContext context) {
                     kuduTable = this.getKuduTable(kuduClient, tableName);
                     getLogger().debug("Kudu connection successfully initialized");
                 }
    +
    --- End diff --
    
    I would add
    
    ````java
    .expressionLanguageSupported(true)
    ````
    To table name, Kudu masters, and batch size properties.
    
    Then, when you retrieve the value you add ``.evaluateAttributeExpressions()`` before ``getValue`` or ``asInteger``. Example:
    
    ````java
    tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions().getValue();
    ````
    
    This way a user can use expression language in the property value to reference externalized variables: in the variable registry, environment variables, etc. It's particularly useful when you're moving a workflow from a development environment to a production environment: you can set your table name in a variable "myTable" and this way you have the same workflow in both environments. It's just a matter of setting a different value for this variable (it's easier than modifying a workflow). And if you have multiple instances of the processors you can update all the processors by changing the value in one place only.
    
    TBH expression language should be enabled on almost all the properties as it's really helping the continuous deployment process.


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140266879
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +95,27 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    --- End diff --
    
    Would it be cleaner to define AllowableValue objects and set the description of each allowable value instead of doing it in the property description? (example in AbstractPutHBase). It's just a general question... I don't have a strong opinion on this.


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140339764
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -124,11 +148,14 @@ public void OnScheduled(final ProcessContext context) {
                     kuduTable = this.getKuduTable(kuduClient, tableName);
                     getLogger().debug("Kudu connection successfully initialized");
                 }
    +
    --- End diff --
    
    Got it, and thank you for your notes @pvillard31 . I will have a push to fix those soon.


---

[GitHub] nifi pull request #2160: [NiFi-4384] - Enhance PutKudu processor to support ...

Posted by cammachusa <gi...@git.apache.org>.
Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140365671
  
    --- Diff: nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java ---
    @@ -94,6 +97,29 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory" +
    +                    " operations but it may have to wait when the buffer is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            .defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be buffered, between 2 - 100000. " +
    +                    "Depend on your memory size, and data size per row set an appropriate batch size. " +
    +                    "Gradually increase this number to find out your best one for best performance")
    +            .defaultValue("100")
    --- End diff --
    
    Like, I made in note in the description. It's depend on their memory size, and data row being inserted, and also their cluster size. Setting the buffer size too big won't help, and too small won't help either. And at noted, developer got to find out this number from his environment. A lot of people hit performance peak at 50 with single machine Kudu's cluster. My colleague hit performance peak at 3500 with 6 nodes cluster (10 CPU, 64 GB Memory each). I randomly pick 100 as I saw it from other Put-xxx processor, but I don't want to put 1000 since most developers test it with single machine, and would leave this default value.


---