You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by zenfenan <gi...@git.apache.org> on 2018/11/12 10:19:12 UTC

[GitHub] nifi pull request #3167: NIFI-5812: Marked Database processors as 'PrimaryNo...

GitHub user zenfenan opened a pull request:

    https://github.com/apache/nifi/pull/3167

    NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zenfenan/nifi NIFI-5812

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/3167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3167
    
----
commit 1d848c036437b485bac0b58c636789cfc96a8f52
Author: zenfenan <si...@...>
Date:   2018-11-12T10:16:26Z

    NIFI-5812: Marked Database processors as PrimaryNodeOnly

----


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    I'd definitely like (a lot) to see something at framework level to make a processor invalid if it is set on Primary Node Only with an incoming connection.


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by MikeThomsen <gi...@git.apache.org>.
Github user MikeThomsen commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    @zenfenan If you're too busy, I can make that change and merge.


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    +1, merged to master, thanks @zenfenan 


---

[GitHub] nifi pull request #3167: NIFI-5812: Marked Database processors as 'PrimaryNo...

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3167#discussion_r232697514
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java ---
    @@ -114,6 +114,7 @@
             + "max value for max value columns. Properties should be added in the format `initial.maxvalue.<max_value_column>`. This value is only used the first time "
             + "the table is accessed (when a Maximum Value Column is specified). In the case of incoming connections, the value is only used the first time for each table "
             + "specified in the flow files.")
    +@PrimaryNodeOnly
    --- End diff --
    
    Since GenerateTableFetch can accept incoming flow files, I don't think we should restrict this to run only on the primary node. For example, you could do a ListDatabaseTables, then distribute the flow files among the cluster, where each node's downstream GTF could do the fetch in parallel. In fact that's the main reason we have GTF rather than just QueryDatabaseTable.


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    I did something like that for the Hive 1 version of PutHiveStreaming, the underlying library wasn't thread-safe if you were working on the same table, so I put in a "table lock" where multiple threads couldn't act on the same table. With QDT it doesn't take incoming flow files, so the table name is effectively hard-coded. By adding PrimaryNodeOnly we can guarantee that QDT only has one instance (it is already TriggerSerially so can't have multiple threads). For GTF if you use ListDatabaseTables on the primary node only (not sure if we should force that with this annotation or not) then each flow file should have a different table, and using a load-balanced connection (or RPG -> Input Port) then each instance of GTF should be working on a different table. The onus is on the user to set Max Concurrent Tasks for GTF to 1 to prevent multi-threaded execution.
    
    Perhaps for GTF, instead of forcing PrimaryNodeOnly, we can make it clear in the doc that if there are no incoming connections, it should probably be run on the primary node only. Alternatively, maybe during annotation processing we can enforce that processors annotated with PrimaryNodeOnly automatically have an InputRequirement of INPUT_FORBIDDEN? Otherwise flow files can get stalled if they are in the queue on a node that is not the primary.


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    Removing the PrimaryNodeOnly annotation from GTF causes a Checkstyle violation due to the unused import. Can you remove the import statement as well? Wouldn't hurt to rebase (not merge) as well. Please and thanks!


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    Since there is some thought process required on how the primary node only restriction be effectively made on the `GenerateTableFetch`, I'll change this PR to only have the annotation on the `QueryDatabaseTable`. Sounds good?


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    @mattyb149 Fixed the checkstyle violation and rebased.


---

[GitHub] nifi pull request #3167: NIFI-5812: Marked Database processors as 'PrimaryNo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/3167


---

[GitHub] nifi issue #3167: NIFI-5812: Marked Database processors as 'PrimaryNodeOnly'

Posted by patricker <gi...@git.apache.org>.
Github user patricker commented on the issue:

    https://github.com/apache/nifi/pull/3167
  
    @mattyb149 I'm in agreement, but I feel like the spirit of this ticket is still an interesting direction to look at.  How do you prevent QDB or GTF from running against the same table on two nodes or two threads at the same time? Multiple threads sounds reasonable, through some kind of lock in the processor, but across multiple nodes? Would be interesting to come up with a solution.


---

[GitHub] nifi pull request #3167: NIFI-5812: Marked Database processors as 'PrimaryNo...

Posted by MikeThomsen <gi...@git.apache.org>.
Github user MikeThomsen commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3167#discussion_r234999192
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java ---
    @@ -114,6 +114,7 @@
             + "max value for max value columns. Properties should be added in the format `initial.maxvalue.<max_value_column>`. This value is only used the first time "
             + "the table is accessed (when a Maximum Value Column is specified). In the case of incoming connections, the value is only used the first time for each table "
             + "specified in the flow files.")
    +@PrimaryNodeOnly
    --- End diff --
    
    Agreed


---