You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matthew Clarke (JIRA)" <ji...@apache.org> on 2017/03/06 18:51:33 UTC

[jira] [Commented] (NIFI-2987) RPG does not do load-balancing well when getting FlowFiles from output ports

    [ https://issues.apache.org/jira/browse/NIFI-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897821#comment-15897821 ] 

Matthew Clarke commented on NIFI-2987:
--------------------------------------

Same condition occurs when the RPG is used to push FlowFiles to a target NiFi cluster.   Consider the following common flow:

SplitText  ---> RPG (pointing at destination NiFi cluster)     

The SplitText produces all its splits at the same time (lets assume 10,000 splits produced).  All 10,000 resulting FlowFiles end up on only one node of the target NiFi rather then load-balanced amongst all nodes.  Of course the next batch of split FlowFiles will go to a different node but that may be infrequent.  So the result is that one downstream node gets hammered each time the SplitText runs.

The RPG should expose a property that allows the user to decide the max number of FlowFile per transferred batch.

> RPG does not do load-balancing well when getting FlowFiles from output ports
> ----------------------------------------------------------------------------
>
>                 Key: NIFI-2987
>                 URL: https://issues.apache.org/jira/browse/NIFI-2987
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>    Affects Versions: 1.0.0
>            Reporter: Matthew Clarke
>
> When a RPG connects to a destination system's output port, it retrieves every FlowFile queued at the time of the connection.  If the source system with the RPG is a NiFi cluster, only one node in the cluster receives all the FlowFiles from that output port.  If there is a steady stream of FlowFiles to the output port, there is still no true balanced delivery of data.
> We need to be able to limit the number of FlowFiles per connection when the source of the FlowFiles is an output port.
> When the destination system with the output port is a cluster and the source system is a cluster with a RPG, the first node to connect will pull all data from the node with the highest queue.  Next node will pull from next highest queued destination node and so on. There is also no guarantee that avery node in the source cluster will get any data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)