You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by na...@bt.com on 2020/05/21 08:58:44 UTC

State Management in a Cluster

Hi All,

Apologies if this an obvious question again, but I've had a search through the administration guide that's made me more confused!

At the moment in our NiFi cluster deployment, we are configuring the Zookpeer provider in the state-management.xml and leaving the local provider as it is (So pointing at  ./state/local). My main question is if we should have both the local and Zookeeper providers enabled? It seems that should be the case but just wanted to clarify.

Secondly, if both the local and Zookeeper state management should be used, what are the differences in the data stored in the local and Zookeeper provider, if any?

Kind Regards,

Nathan

RE: State Management in a Cluster

Posted by na...@bt.com.
Hi Mark,

It makes complete sense now, thanks for clearing it up!

Nathan

From: Mark Payne [mailto:markap14@hotmail.com]
Sent: 21 May 2020 15:35
To: users@nifi.apache.org
Subject: Re: State Management in a Cluster

Nathan,

Yes, both are needed. Some processors will store local state while others store clustered state. The difference is whether the state being stored should be readable by all nodes in the cluster or only the local node. Some processors actually make use of either/both. ListFile is a good example. ListFile creates an output FlowFile for each file in a given directory and keeps state about the files that it has already listed. The processor is configured with a property that indicates whether the directory it’s monitoring is on a local file system or a network-mounted drive (NFS mount for example). If the directory is on the local file system, each node in the cluster will want to be monitoring the directory and storing state about its local file system so it uses Local State Management. On the other hand, if it’s an NFS mount, the processors should be run only on the Primary Node and the state should be shared across the cluster. This way, if the Primary Node is shutdown or crashes, a new Primary Node is elected and can read the state that was stored by the previous node. So, in order for that to work, the state must be shared across all nodes in the cluster, so it’s stored using the Cluster State Provider.

Does that make sense?

Thanks
-Mark



On May 21, 2020, at 4:58 AM, nathan.english@bt.com<ma...@bt.com> wrote:

Hi All,

Apologies if this an obvious question again, but I’ve had a search through the administration guide that’s made me more confused!

At the moment in our NiFi cluster deployment, we are configuring the Zookpeer provider in the state-management.xml and leaving the local provider as it is (So pointing at  ./state/local). My main question is if we should have both the local and Zookeeper providers enabled? It seems that should be the case but just wanted to clarify.

Secondly, if both the local and Zookeeper state management should be used, what are the differences in the data stored in the local and Zookeeper provider, if any?

Kind Regards,

Nathan


Re: State Management in a Cluster

Posted by Mark Payne <ma...@hotmail.com>.
Nathan,

Yes, both are needed. Some processors will store local state while others store clustered state. The difference is whether the state being stored should be readable by all nodes in the cluster or only the local node. Some processors actually make use of either/both. ListFile is a good example. ListFile creates an output FlowFile for each file in a given directory and keeps state about the files that it has already listed. The processor is configured with a property that indicates whether the directory it’s monitoring is on a local file system or a network-mounted drive (NFS mount for example). If the directory is on the local file system, each node in the cluster will want to be monitoring the directory and storing state about its local file system so it uses Local State Management. On the other hand, if it’s an NFS mount, the processors should be run only on the Primary Node and the state should be shared across the cluster. This way, if the Primary Node is shutdown or crashes, a new Primary Node is elected and can read the state that was stored by the previous node. So, in order for that to work, the state must be shared across all nodes in the cluster, so it’s stored using the Cluster State Provider.

Does that make sense?

Thanks
-Mark


On May 21, 2020, at 4:58 AM, nathan.english@bt.com<ma...@bt.com> wrote:

Hi All,

Apologies if this an obvious question again, but I’ve had a search through the administration guide that’s made me more confused!

At the moment in our NiFi cluster deployment, we are configuring the Zookpeer provider in the state-management.xml and leaving the local provider as it is (So pointing at  ./state/local). My main question is if we should have both the local and Zookeeper providers enabled? It seems that should be the case but just wanted to clarify.

Secondly, if both the local and Zookeeper state management should be used, what are the differences in the data stored in the local and Zookeeper provider, if any?

Kind Regards,

Nathan