You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by James McMahon <js...@gmail.com> on 2017/10/19 12:58:24 UTC

Prioritizing flowFiles to tailor throughput

Our team is considering ways to accelerate delivery for certain subsets of
content we process through NiFi. We are using Apache NiFi 0.7.x as our
baseline.

This link discusses a recommended approach to content prioritization using
PriorityAttributePrioritizer on a connector (queue) to tailor throughput
based on a priority attribute we set upstream in our flow:

https://stackoverflow.com/questions/42528993/how-to-specify-priority-attributes-for-individual-flowfiles

How often does the connector queue have to re-sort contents in order to
enforce our priority attribute? Is it re-sorting *every *single time new
flowFiles hit the queue? Won't that markedly and negatively impact
performance?

If our priority 1s are a huge volume of flowfiles that persists over time,
won't this approach cause our priority 2s, 3s, etc etc to languish in queue?

The described approach seems to embed significant business logic in the
NiFi workflows. In an environment where priorities change often, would that
be considered a poor approach? Might it be better to enforce priority
processing at a higher architectural level - a lightweight NiFi server to
accelerate delivery of priority one content and email alerts, a priority
two suite of NiFi servers for standard flowfile volume, a priority three
suite of servers to handle long-term bulk processing, etc etc?

Thanks in advance for your help.  -Jim

Re: Prioritizing flowFiles to tailor throughput

Posted by Joe Percivall <jp...@apache.org>.
Hey James,

Sorry, no one responded when you first sent the message but I'm curious
what you ended up doing and any findings you had. Also, wanted to bring
this thread back up to the attention of the larger group as it brings up
some interesting questions I haven't found discussed elsewhere.

On the topic of the re-sorting of the queue, I was curious about the
answer, so I dug down to the StandardFlowFileQueue and found that it's
primarily just wrapping an instance of Java's PriorityQueue for its active
queue[1]. This means that sorting is done each time a FlowFile is enqueued
but also that we have immediate access to the head of the queue. I'm sure
someone else (Mark Payne?) could explain better how we make use of the
nuances of the queue for better performance and the impacts the different
queue prioritizers have.

For the higher priority FlowFiles starving out lower priority ones, I'm
thinking about a way to give a weight instead of a priority. So in essence,
a "weighted funnel processor", which grabs X Flowfiles each time but has a
weighting assigned to different categories such that you take a certain
number of each category based on a given weight. That said, I'm not sure
that would be guaranteed to work when FlowFiles in the queue are swapped
out since even if we iterated over everything in the incoming connection,
there are still others swapped to disk. Also, there's probably performance
concerns if we tried to implement it using the current tools offered to a
processor.

For the separate NiFis approach, I'm curious what other's view is.
Personally, it makes sense to me, that for flows that are dramatically
different in priority you'd want to section it off to another instance of
NiFi. Essentially the separation between data-plane and control-plane
instances of NiFi.


Lastly, James, I assume you're limited to using the 0.7.x release for a
specific reason? I'd highly suggest upgrading to the latest version
whenever possible. There are many security and performance improvements,
and of course many new features.

[1]
https://github.com/apache/nifi/blob/7f4cfd51ea07ead6c9b71b6c6d6f87a352b801d3/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardFlowFileQueue.java#L89

Joe

On Thu, Oct 19, 2017 at 8:58 AM, James McMahon <js...@gmail.com> wrote:

> Our team is considering ways to accelerate delivery for certain subsets of
> content we process through NiFi. We are using Apache NiFi 0.7.x as our
> baseline.
>
> This link discusses a recommended approach to content prioritization using
> PriorityAttributePrioritizer on a connector (queue) to tailor throughput
> based on a priority attribute we set upstream in our flow:
>
> https://stackoverflow.com/questions/42528993/how-to-
> specify-priority-attributes-for-individual-flowfiles
>
> How often does the connector queue have to re-sort contents in order to
> enforce our priority attribute? Is it re-sorting *every *single time new
> flowFiles hit the queue? Won't that markedly and negatively impact
> performance?
>
> If our priority 1s are a huge volume of flowfiles that persists over time,
> won't this approach cause our priority 2s, 3s, etc etc to languish in queue?
>
> The described approach seems to embed significant business logic in the
> NiFi workflows. In an environment where priorities change often, would that
> be considered a poor approach? Might it be better to enforce priority
> processing at a higher architectural level - a lightweight NiFi server to
> accelerate delivery of priority one content and email alerts, a priority
> two suite of NiFi servers for standard flowfile volume, a priority three
> suite of servers to handle long-term bulk processing, etc etc?
>
> Thanks in advance for your help.  -Jim
>



-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jpercivall@apache.com