You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Mark Bean <ma...@gmail.com> on 2018/07/13 19:25:46 UTC

[DISCUSS] NiFi shared file system

I'd like to start a discussion on making NiFi Clustering more robust. As it
is, a Cluster is very beneficial from a management standpoint. A change
only needs to be made in one place - any place - and it is replicated
across the cluster. However, from a data perspective, the same flexibility
is not available. Data is node-specific and a flowfile will only ever exist
on a single node. Start thinking single point of failure. It would be
desirable if data could be shuffled over to the same point in the flow on
another (less busy) node to complete processing.

There are many factors to consider such as the time it would take to
transfer flowfiles to another node versus waiting for a transient spike in
load to return to normal levels. However, consider the other extreme, where
a node crashes or for some reason is removed from the cluster (but the file
system is still available). The data could be recovered from the
problematic node and processed by surviving nodes. This also allows for
flexibility to dynamically spin up _and down_ additional NiFi instances
on-demand while not having to wait to bleed off data when removing a node
from the cluster.

Fundamentally, this comes down to some form of shared file system across
the Cluster.

Admittedly, there are very complex issues involved not the least of which
is proper synchronization. Yet the concept could provide huge benefits
especially in terms of load balancing across the Cluster.

-Mark