You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Ravi Nallappan <ra...@gmail.com> on 2021/12/16 08:54:06 UTC

NiFi flow.xml.gz corruption

Hi,

A general question regarding flow.xml.gz. (Nifi version 1.10.0)

We have about 6-8 nifi nodes in a cluster (kubernetes environment) and we
do see the file get corrupted at times and causing the node to not come up
on restarts and eventually kubernetes gives up.

Based on my search on the issue, shows the easiest recovery is to remove
the corrupted flow.xml.gz and let the node come up, join the cluster and
sync up with golden copy from other nodes.

However, this will be a challenge to do in kubernetes environment, any
suggestion on this? (possibly can write a script to check and do the
action, but how do we add the hook to do just the pod gets restarted)

Is NiFi Registry a better recommendation in production environment? If yes,
will look at this as a longer term solution.

Thanks in advance.

regards,

Ravi Nallappan

Re: NiFi flow.xml.gz corruption

Posted by Mark Payne <ma...@hotmail.com>.
Hi Ravi,

Not sure why you would be seeing the flow get corrupted. When you say “corrupted” - do you mean truly corrupted? As in, the file cannot be read/parsed? Or do you mean that it’s out of sync, meaning that NiFi can read it but won’t join the cluster because its flow is different from the cluster’s flow?

In either case, though, removing the file and restarting is the easiest option traditionally. Within Kubernetes I can see how this would get complicated. In the upcoming release (1.16) we have made a lot of improvements around clustering that should resolve this. But you’d need to update to the latest to get these improvements.

I don’t know enough about kubernetes to make any recommendations about lifecycle hooks, etc. You could certainly reach out to the Kubernetes community about that.

As far as NiFi registry goes: I don’t think it’s going to help here. NiFi Registry makes it convenient to build a dataflow and then store it in the registry. You can then check out that flow in another NiFi instance (or checkout a second copy in the same NiFi instance) and then see when changes are made, update to the newest version of the flow, etc. Think version control for individual Process Groups.

Hope this helps!
-Mark


> On Dec 16, 2021, at 3:54 AM, Ravi Nallappan <ra...@gmail.com> wrote:
> 
> Hi,
> 
> A general question regarding flow.xml.gz. (Nifi version 1.10.0)
> 
> We have about 6-8 nifi nodes in a cluster (kubernetes environment) and we do see the file get corrupted at times and causing the node to not come up on restarts and eventually kubernetes gives up.
> 
> Based on my search on the issue, shows the easiest recovery is to remove the corrupted flow.xml.gz and let the node come up, join the cluster and sync up with golden copy from other nodes.
> 
> However, this will be a challenge to do in kubernetes environment, any suggestion on this? (possibly can write a script to check and do the action, but how do we add the hook to do just the pod gets restarted)
> 
> Is NiFi Registry a better recommendation in production environment? If yes, will look at this as a longer term solution.
> 
> Thanks in advance.
> 
> regards,
> 
> Ravi Nallappan
>