You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote)" <ch...@hpe.com> on 2016/03/08 21:39:27 UTC

A couple of general questions

Hi!  I’m fairly new to NiFi but I have installed it and developed a couple of simple flows.  I really like it.

Here are my questions:

  1.  Scale out: does NiFi run different instances of a processor on multiple nodes at the same time or only a single instance on a single node at any given time?
  2.  Split-brain: How does NiFi handle the situation where the cluster nodes can be dived in to two groups A and B, where the nodes in A can all talk to each other, and the nodes in B can all talk to each other, but no nodes in A can talk to any nodes in B and vice versa?  A classic example would be where I have  NiFi cluster running in two data centers and the all the network links between the two data centers go down.

Re: A couple of general questions

Posted by "McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote)" <ch...@hpe.com>.
Thanks, Mark.  Very helpful on all counts.




On 3/8/16, 3:47 PM, "Mark Payne" <ma...@hotmail.com> wrote:

>Chris,
>
>Welcome to the NiFi community!
>
>Regarding scaling out: The same flow runs on all nodes in a cluster. So all nodes run all Processors. The only exception
>to this is if you use the Scheduling Strategy of "Primary Node" - in that case, the Processor will run on only a single node.
>The idea here is that if you are using some sort of protocol like a pub/sub (for instance, JMS Topics) where each node would
>end up just duplicating the data, you can choose to run that on only a single node.
>
>In terms of separate data centers: your NiFi cluster's Cluster Manager (NCM) would need to be able to communicate with
>all nodes in the cluster. Nodes within a cluster do not communicate with one another, typically. If you were to use site-to-site
>(via Remote Process Groups) to send data back to the same cluster, site-to-site mandates that the nodes be able to communicate
>with one another. However, in the case of the connection going down, the data can simply queue up and be sent when the
>connection is re-established - or the data could be spread out across the other nodes that it can communicate with.
>
>I realize that this is quite a bit of info to take in for someone new to NiFi, so if something is not clear, or if this brings up
>any additional questions, please let us know, and we'll try to provide more clarification.
>
>Again, welcome to the community, and I hope this helps!
>
>-Mark
>
>> On Mar 8, 2016, at 3:39 PM, McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote) <ch...@hpe.com> wrote:
>> 
>> Hi!  I’m fairly new to NiFi but I have installed it and developed a couple of simple flows.  I really like it.
>> 
>> Here are my questions:
>> 
>>  1.  Scale out: does NiFi run different instances of a processor on multiple nodes at the same time or only a single instance on a single node at any given time?
>>  2.  Split-brain: How does NiFi handle the situation where the cluster nodes can be dived in to two groups A and B, where the nodes in A can all talk to each other, and the nodes in B can all talk to each other, but no nodes in A can talk to any nodes in B and vice versa?  A classic example would be where I have  NiFi cluster running in two data centers and the all the network links between the two data centers go down.
>

Re: A couple of general questions

Posted by Mark Payne <ma...@hotmail.com>.
Chris,

Welcome to the NiFi community!

Regarding scaling out: The same flow runs on all nodes in a cluster. So all nodes run all Processors. The only exception
to this is if you use the Scheduling Strategy of "Primary Node" - in that case, the Processor will run on only a single node.
The idea here is that if you are using some sort of protocol like a pub/sub (for instance, JMS Topics) where each node would
end up just duplicating the data, you can choose to run that on only a single node.

In terms of separate data centers: your NiFi cluster's Cluster Manager (NCM) would need to be able to communicate with
all nodes in the cluster. Nodes within a cluster do not communicate with one another, typically. If you were to use site-to-site
(via Remote Process Groups) to send data back to the same cluster, site-to-site mandates that the nodes be able to communicate
with one another. However, in the case of the connection going down, the data can simply queue up and be sent when the
connection is re-established - or the data could be spread out across the other nodes that it can communicate with.

I realize that this is quite a bit of info to take in for someone new to NiFi, so if something is not clear, or if this brings up
any additional questions, please let us know, and we'll try to provide more clarification.

Again, welcome to the community, and I hope this helps!

-Mark

> On Mar 8, 2016, at 3:39 PM, McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote) <ch...@hpe.com> wrote:
> 
> Hi!  I’m fairly new to NiFi but I have installed it and developed a couple of simple flows.  I really like it.
> 
> Here are my questions:
> 
>  1.  Scale out: does NiFi run different instances of a processor on multiple nodes at the same time or only a single instance on a single node at any given time?
>  2.  Split-brain: How does NiFi handle the situation where the cluster nodes can be dived in to two groups A and B, where the nodes in A can all talk to each other, and the nodes in B can all talk to each other, but no nodes in A can talk to any nodes in B and vice versa?  A classic example would be where I have  NiFi cluster running in two data centers and the all the network links between the two data centers go down.