You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Koji Kawamura (JIRA)" <ji...@apache.org> on 2018/12/10 01:53:00 UTC

[jira] [Commented] (NIFI-5882) Connector Prioritizers doesn't work together with Load Balance Strategy

    [ https://issues.apache.org/jira/browse/NIFI-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714209#comment-16714209 ] 

Koji Kawamura commented on NIFI-5882:
-------------------------------------

Hi [~jzahner], there is a note on the user guid doc about how prioritizer work if load-balancing is configured together:
{quote}With a [Load Balance Strategy|https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#load_balance_strategy] configured, the connection has a queue per node in addition to the local queue. The prioritizer will sort the data in each queue independently.
{quote}
[https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization]

Do you see different behavior than above doc?

 

My understanding on how things work is:
 * If there are 5 FlowFiles (p1 to p5) on the primary node at a 3 node cluster
 * and a connection is configured with RoundRobin load-balancing, then there will be 1 local queue (qL) and 2 remote queues (qR1, qR2) at the primary node for the connection
 * then 5 FlowFiles will be placed in each queue at the primary node for example:
 ** qL: p1, p4
 ** qR1: p2, p5 (to be sent to node1)
 ** qR2: p3 (to be sent to node2)
 * The connection has also PriorityAttributePrioritizer
 * In this case, the primary node uses the prioritizer at each remote queue, e.g. at qR1, p2 is sent to another node before p5
 * After load-balancing finished, each node will have queued FlowFiles at their local queue as:
 ** Primary node local queue: p1, p4
 ** node1 local queue: p2, p5
 ** node2 local queue: p3
 * Then each node's local queue uses PriorityAttributePrioritizer again to process FlowFiles at next processor, FetchSFTP in the reported flow

Prioritizer only manage per-node priority. Cluster-wide ordering is not possible after data distribution.
 I hope this explanation clarifies what NiFi does and goes along with what you see.
 If not, please elaborate the issue by pointing some specific FlowFile examples that you think out of order.
 I tried to find such within attached two FlowFile list images, but couldn't. Thanks!

> Connector Prioritizers doesn't work together with Load Balance Strategy
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5882
>                 URL: https://issues.apache.org/jira/browse/NIFI-5882
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.8.0
>         Environment: Centos 7.5, Secured 8 Node NiFi Cluster
>            Reporter: Josef Zahner
>            Priority: Major
>         Attachments: connector_config.png, queue_with_one_processor.png, queue_with_two_processors.png, template_overview.png
>
>
> For my template please check the picture "template_overview.png". On the left hand side the working (two processor) example and on the right hand side the not working one (one processor).
> I have a ListSFTP Processor which reads files from 4 different folders. In the filename of the files is a number (epochtime) which I'm parsing and set it as "priority" attribute. We have a cluster, so I what I want to achieve for the FetchSFTP is, that the files are fetched in order and are equally distributed over our 8-node cluster.
> However, it seems that if I'm combining to set the "priority" attribute on an UpdateAttribute processor and on the directly attached connector use the following features:
>  * Load Balance Strategy: Round Robin
>  * Select Prioritizers: PriorityAttributePrioritizer
> the prioritizers doesn't seem to have any impact. 
> If i'm setting the priority attribute on an extra processor and use there only the prioritizer - all files are in order but still on the primary node. On the next processor then I'm setting the loadbalancing strategy for the cluster (and add another attribute, but doesn't matter) and the prioritizer together. That way it works. A picture of the queue for both examples is attached (queue_with_one_processor & queue_with_two_processors.png).
> *To sum up*, it seems if I'm setting the "priority" attribute on an UpdateAttribute processor and directly try to use it on the attached connector with a loadbalancing strategy and the prioritizer (PriorityAttributePrioritizer) then the priority attribute doesn't work as expected. If I'm setting the "priority" attribute on a separate processor and then do on an additional processor the magic load balancing strategy stuff together with the prioritizer then it works. 
> Cheers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)