You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by James McMahon <js...@gmail.com> on 2017/02/16 15:05:53 UTC

Restoring from archived flow.xml.gz files

Good morning. Our team has been discussing backup, versioning, and restore
strategies to manage the backups we create of our NiFi flow.xml.gz files. I
have a few questions about this.

Currently I manually execute a daily backup of my NiFi workflow from the
Controller Settings UI, General tab, link Back-up flow. Each time I do this
I get a new file in my archive subdirectory that appears to be prefixed
with some sort of a UID. [UID]-flow.xml.gz. WE plan to version control
these using GIT, in all likelihood.

My questions are these:
1. I would like to automate the daily backup so that it is independent of
the UI. I hope to run the backup from a cron job each night at midnight,
along with many of our other periodic administrative jobs. What tools or
APIs are available to me to automate this backup from a shell script in the
Linux environment?

2. suppose we need to restore one of these backups that were created from
flow.xml.gz on the same NiFi server instance. I have been told about the
importance of gracefully shutting down NiFi before replacing the current
flow.xml.gz with any one of these backup archived files. What about the
state of the repositories? Will the state of the archived flow file -
processors, queues, et al - still be available in our content, provenance,
and flow repositories? Is the restore simply a matter of graceful shutdown,
swap of the archived flow.xml.gz in for the current instance, and restart?

3. In a dev/int/prod environment that runs the same NiFi version and
configuration on each NiFi server in each environment, do folks promote
NiFi flow.xml.gz instances to production from dev and int? Are there "best
practices" typically employed to do that in a DevOps environment?

Thanks in advance for your insights and help. -Jim

Re: Restoring from archived flow.xml.gz files

Posted by James McMahon <js...@gmail.com>.
Thank you very much for these insights Bryan. I will dig into these
resources you recommended and learn more about these efforts. -Jim

On Thu, Feb 16, 2017 at 10:28 AM, Bryan Bende <bb...@gmail.com> wrote:

> Hey Jim,
>
> Great questions! The area of deployment & version-control of flows
> definitely needs some improvement and is going to be a focus for the
> community in the near future. You may want to read through this
> feature proposal [1] as I believe it will handle everything you are
> looking to do. The community recently voted to create a sub-project
> called Nifi Registry which is where the flow registry will live.
>
> For present day...
>
> 1) In 0.x NiFi the UI had a button for initiating a backup, and I
> assume there was a REST endpoint you could call for this. In 1.x the
> backups happen automatically in the background so there is no API.
>
> 2) Yes the state of all flow files and everything in the flow will be
> restored after you restart with a back up, but keep in mind this could
> be dangerous if you have data queued up. For example, say your current
> flow has ProcessorA -> ProcessorB with data sitting between them, then
> you restore a back up that doesn't have ProcessorB, well all that data
> will be gone when you start up with the restored flow.
>
> 3) You can sometimes promote the flow.xml.gz assuming you want to
> promote the entire flow and assuming you are using the same sensitive
> properties encryption key across all environments (or didn't set one
> in any environment which means they are the same). Another approach
> people have taken is to use templates to deploy process groups.
> Basically you would bleed out a process group by stopping source
> processors and let remaining data finish processing, then replace the
> process group with the new version from a template. Some people have
> worked to automate this process [2][3].
>
> Thanks,
>
> Bryan
>
> [1] https://cwiki.apache.org/confluence/display/NIFI/
> Configuration+Management+of+Flows
> [2] https://github.com/aperepel/nifi-api-deploy
> [3] https://github.com/ijokarumawak/nifi-deploy-process-group
>
>
> On Thu, Feb 16, 2017 at 10:05 AM, James McMahon <js...@gmail.com>
> wrote:
> > Good morning. Our team has been discussing backup, versioning, and
> restore
> > strategies to manage the backups we create of our NiFi flow.xml.gz
> files. I
> > have a few questions about this.
> >
> > Currently I manually execute a daily backup of my NiFi workflow from the
> > Controller Settings UI, General tab, link Back-up flow. Each time I do
> this
> > I get a new file in my archive subdirectory that appears to be prefixed
> with
> > some sort of a UID. [UID]-flow.xml.gz. WE plan to version control these
> > using GIT, in all likelihood.
> >
> > My questions are these:
> > 1. I would like to automate the daily backup so that it is independent of
> > the UI. I hope to run the backup from a cron job each night at midnight,
> > along with many of our other periodic administrative jobs. What tools or
> > APIs are available to me to automate this backup from a shell script in
> the
> > Linux environment?
> >
> > 2. suppose we need to restore one of these backups that were created from
> > flow.xml.gz on the same NiFi server instance. I have been told about the
> > importance of gracefully shutting down NiFi before replacing the current
> > flow.xml.gz with any one of these backup archived files. What about the
> > state of the repositories? Will the state of the archived flow file -
> > processors, queues, et al - still be available in our content,
> provenance,
> > and flow repositories? Is the restore simply a matter of graceful
> shutdown,
> > swap of the archived flow.xml.gz in for the current instance, and
> restart?
> >
> > 3. In a dev/int/prod environment that runs the same NiFi version and
> > configuration on each NiFi server in each environment, do folks promote
> NiFi
> > flow.xml.gz instances to production from dev and int? Are there "best
> > practices" typically employed to do that in a DevOps environment?
> >
> > Thanks in advance for your insights and help. -Jim
>

Re: Restoring from archived flow.xml.gz files

Posted by Bryan Bende <bb...@gmail.com>.
Hey Jim,

Great questions! The area of deployment & version-control of flows
definitely needs some improvement and is going to be a focus for the
community in the near future. You may want to read through this
feature proposal [1] as I believe it will handle everything you are
looking to do. The community recently voted to create a sub-project
called Nifi Registry which is where the flow registry will live.

For present day...

1) In 0.x NiFi the UI had a button for initiating a backup, and I
assume there was a REST endpoint you could call for this. In 1.x the
backups happen automatically in the background so there is no API.

2) Yes the state of all flow files and everything in the flow will be
restored after you restart with a back up, but keep in mind this could
be dangerous if you have data queued up. For example, say your current
flow has ProcessorA -> ProcessorB with data sitting between them, then
you restore a back up that doesn't have ProcessorB, well all that data
will be gone when you start up with the restored flow.

3) You can sometimes promote the flow.xml.gz assuming you want to
promote the entire flow and assuming you are using the same sensitive
properties encryption key across all environments (or didn't set one
in any environment which means they are the same). Another approach
people have taken is to use templates to deploy process groups.
Basically you would bleed out a process group by stopping source
processors and let remaining data finish processing, then replace the
process group with the new version from a template. Some people have
worked to automate this process [2][3].

Thanks,

Bryan

[1] https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
[2] https://github.com/aperepel/nifi-api-deploy
[3] https://github.com/ijokarumawak/nifi-deploy-process-group


On Thu, Feb 16, 2017 at 10:05 AM, James McMahon <js...@gmail.com> wrote:
> Good morning. Our team has been discussing backup, versioning, and restore
> strategies to manage the backups we create of our NiFi flow.xml.gz files. I
> have a few questions about this.
>
> Currently I manually execute a daily backup of my NiFi workflow from the
> Controller Settings UI, General tab, link Back-up flow. Each time I do this
> I get a new file in my archive subdirectory that appears to be prefixed with
> some sort of a UID. [UID]-flow.xml.gz. WE plan to version control these
> using GIT, in all likelihood.
>
> My questions are these:
> 1. I would like to automate the daily backup so that it is independent of
> the UI. I hope to run the backup from a cron job each night at midnight,
> along with many of our other periodic administrative jobs. What tools or
> APIs are available to me to automate this backup from a shell script in the
> Linux environment?
>
> 2. suppose we need to restore one of these backups that were created from
> flow.xml.gz on the same NiFi server instance. I have been told about the
> importance of gracefully shutting down NiFi before replacing the current
> flow.xml.gz with any one of these backup archived files. What about the
> state of the repositories? Will the state of the archived flow file -
> processors, queues, et al - still be available in our content, provenance,
> and flow repositories? Is the restore simply a matter of graceful shutdown,
> swap of the archived flow.xml.gz in for the current instance, and restart?
>
> 3. In a dev/int/prod environment that runs the same NiFi version and
> configuration on each NiFi server in each environment, do folks promote NiFi
> flow.xml.gz instances to production from dev and int? Are there "best
> practices" typically employed to do that in a DevOps environment?
>
> Thanks in advance for your insights and help. -Jim