You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@nifi.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2015/02/10 00:34:35 UTC

[jira] [Created] (NIFI-337) Automated cluster load balancing

Andrew Purtell created NIFI-337:
-----------------------------------

Summary: Automated cluster load balancing
Key: NIFI-337
URL: https://issues.apache.org/jira/browse/NIFI-337
Project: Apache NiFi
Issue Type: New Feature
Reporter: Andrew Purtell

On dev@ in response to an inquiry, from [~joewitt]:
{quote}
The processors themselves are available and ready to run on all nodes at all times. It's really just a question of whether they have data to run on. We have always taken the view that 'if you want scalable dataflow' use scalable interfaces. And I think that is the way to go in every case you can pull it off. That generally meant one should use datasources which offer queueing semantics where multiple independent nodes can pull from the queue with 'at-least-once' guarantees. In addition each node has back pressure so if it falls behind it slows its rate of pickup which means other nodes in the cluster can pickup the slack. This has worked extremely well.

That said, I recognize that it isn't always possible to use scalable interfaces and given enough non-scalable datasources the cluster could become out of balance. So this certainly seems like a good / valuable / fun / non-trivial problem to tackle. If we allow connections between processors to be auto-balanced then it will make for a pretty smooth experience as users won't really have to think too much about it.
{quote}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)